[Bioperl-l] Extracting gi no from refseq record

Hilmar Lapp hlapp at gnf.org
Thu Apr 3 15:27:44 EST 2003


The problem is not in SeqIO. The offending statement is in the method  
fetch() in Bio::Index::AbstractSeq, which is the method that does the  
actual retrieval work. The chunk of input is handed off to  
SeqIO::genbank, which I'm sure does the right job, only to be  
overwritten after it returns the parsed sequence object to fetch():

     # we essentially assumme that the primary_id for the database
     # is the display_id
     $seq->primary_id($seq->display_id()) if( defined $seq && ref($seq)  
&&
					     $seq->isa('Bio::PrimarySeqI') );

Can anyone explain the usefulness of this statement? Otherwise we just  
delete it and the bug is fixed. (Siddharta, as an immediate fix you  
might just want to do this.)

	-hilmar

On Thursday, April 3, 2003, at 03:13  PM, Siddhartha Basu wrote:

> Hi hilmar,
>
> Hilmar Lapp wrote:
>> RefSeq does feature this line, at least the last time I checked. We  
>> also do test for this being parsed correctly, namely in t/SeqIO.t.  
>> It's not always as bad as it sometimes seems. I ran an entire  
>> download of RefSeq a couple weeks ago and the GI# was parsed out of  
>> every single record.
>> Siddharta, do you run at least bioperl 1.2?
> Yes i am running bioperl 1.2
>
> If you do, can you run
>>     $ make test_SeqIO
>> from the root directory of the bioperl distribution? Can you please  
>> mail the output?
>
> No tests failed it seems. Here is the output.
> ======================================================================= 
> ===
> PERL_DL_NONLAZY=1 /usr/bin/perl -Iblib/arch -Iblib/lib  
> -I/usr/lib/perl5/5.8.0/i386-linux-thread-multi -I/usr/lib/perl5/5.8.0  
> -e 'use Test::Harness qw(&runtests $verbose); $verbose=0; runtests  
> @ARGV;' t/SeqIO.t
> t/SeqIO....ok
>         3/146 skipped:
> All tests successful, 3 subtests skipped.
> Files=1, Tests=146,  1 wallclock secs ( 0.88 cusr +  0.03 csys =  0.91  
> CPU)
> ======================================================================= 
> ====
>
> If a test fails, could you please run
>>     $ make test_SeqIO TEST_VERBOSE=1
>> and mail the output for the failed test(s)?
>> If no test fails, could you email the accession# of the sequence you  
>> retrieved and for which there was no GI#?
>
> The accession no i have tried is NP_031416.
> This is a mouse refseq NP no. which i have choosen randomly from mouse  
> refseq flat files. I have downloaded the refseq flat files that ends  
> with gbff and gnp and then used the following code to index all of  
> them.
>
> ======================================================================= 
> ======
> #!/usr/bin/perl -w
>
> use strict;
> use Bio::Index::GenBank;
>
>
> my $IdFile = "$ENV{BIOPERL_INDEX}/genebankall";
>
>
> my $Idx = Bio::Index::GenBank->new( -filename => $IdFile,  
> -write_flag=> 'WRITE');
>
> my $Folder = "/home/basu/scriptanalysis/Genepept";
>
> opendir(FL,$Folder) || die "Can't open folder:$!";
>
> my @Files = grep (!/^\.\.?$/,readdir(FL));
>
>
> my @FullPath = ();
>
> foreach (@Files) { push (@FullPath, "$Folder/$_"); }
>
> $Idx->make_index(@FullPath);
>
> print "Indexed successfully\n";
>
> closedir(FL);
>
> exit;
> ======================================================================= 
> ==========
>
> Then used a simple script for fetching ......
>
> ======================================================================= 
> ===========
> #!/usr/bin/perl -w
> #
>
> use Bio::Index::GenBank;
>
> use strict;
>
> my $GenIndName = "$ENV{BIOPERL_INDEX}/genebankall";
>
> my $Idx = Bio::Index::GenBank->new(-filename => $GenIndName);
>
> my $Seq = $Idx->get_Seq_by_acc('NP_031416');
>
>
>
> print $Seq->primary_id(),"\n";
> exit;
> ======================================================================= 
> ============
>
> The output is Chrna7. My wish is to get the gi no that is 6671503.
>
> So how do i get that value.
>
>
> -siddhartha
>
>
>>     -hilmar
>
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------



More information about the Bioperl-l mailing list