[Bioperl-l] Extracting gi no from refseq record
Hilmar Lapp
hlapp at gnf.org
Thu Apr 3 15:27:44 EST 2003
The problem is not in SeqIO. The offending statement is in the method
fetch() in Bio::Index::AbstractSeq, which is the method that does the
actual retrieval work. The chunk of input is handed off to
SeqIO::genbank, which I'm sure does the right job, only to be
overwritten after it returns the parsed sequence object to fetch():
# we essentially assumme that the primary_id for the database
# is the display_id
$seq->primary_id($seq->display_id()) if( defined $seq && ref($seq)
&&
$seq->isa('Bio::PrimarySeqI') );
Can anyone explain the usefulness of this statement? Otherwise we just
delete it and the bug is fixed. (Siddharta, as an immediate fix you
might just want to do this.)
-hilmar
On Thursday, April 3, 2003, at 03:13 PM, Siddhartha Basu wrote:
> Hi hilmar,
>
> Hilmar Lapp wrote:
>> RefSeq does feature this line, at least the last time I checked. We
>> also do test for this being parsed correctly, namely in t/SeqIO.t.
>> It's not always as bad as it sometimes seems. I ran an entire
>> download of RefSeq a couple weeks ago and the GI# was parsed out of
>> every single record.
>> Siddharta, do you run at least bioperl 1.2?
> Yes i am running bioperl 1.2
>
> If you do, can you run
>> $ make test_SeqIO
>> from the root directory of the bioperl distribution? Can you please
>> mail the output?
>
> No tests failed it seems. Here is the output.
> =======================================================================
> ===
> PERL_DL_NONLAZY=1 /usr/bin/perl -Iblib/arch -Iblib/lib
> -I/usr/lib/perl5/5.8.0/i386-linux-thread-multi -I/usr/lib/perl5/5.8.0
> -e 'use Test::Harness qw(&runtests $verbose); $verbose=0; runtests
> @ARGV;' t/SeqIO.t
> t/SeqIO....ok
> 3/146 skipped:
> All tests successful, 3 subtests skipped.
> Files=1, Tests=146, 1 wallclock secs ( 0.88 cusr + 0.03 csys = 0.91
> CPU)
> =======================================================================
> ====
>
> If a test fails, could you please run
>> $ make test_SeqIO TEST_VERBOSE=1
>> and mail the output for the failed test(s)?
>> If no test fails, could you email the accession# of the sequence you
>> retrieved and for which there was no GI#?
>
> The accession no i have tried is NP_031416.
> This is a mouse refseq NP no. which i have choosen randomly from mouse
> refseq flat files. I have downloaded the refseq flat files that ends
> with gbff and gnp and then used the following code to index all of
> them.
>
> =======================================================================
> ======
> #!/usr/bin/perl -w
>
> use strict;
> use Bio::Index::GenBank;
>
>
> my $IdFile = "$ENV{BIOPERL_INDEX}/genebankall";
>
>
> my $Idx = Bio::Index::GenBank->new( -filename => $IdFile,
> -write_flag=> 'WRITE');
>
> my $Folder = "/home/basu/scriptanalysis/Genepept";
>
> opendir(FL,$Folder) || die "Can't open folder:$!";
>
> my @Files = grep (!/^\.\.?$/,readdir(FL));
>
>
> my @FullPath = ();
>
> foreach (@Files) { push (@FullPath, "$Folder/$_"); }
>
> $Idx->make_index(@FullPath);
>
> print "Indexed successfully\n";
>
> closedir(FL);
>
> exit;
> =======================================================================
> ==========
>
> Then used a simple script for fetching ......
>
> =======================================================================
> ===========
> #!/usr/bin/perl -w
> #
>
> use Bio::Index::GenBank;
>
> use strict;
>
> my $GenIndName = "$ENV{BIOPERL_INDEX}/genebankall";
>
> my $Idx = Bio::Index::GenBank->new(-filename => $GenIndName);
>
> my $Seq = $Idx->get_Seq_by_acc('NP_031416');
>
>
>
> print $Seq->primary_id(),"\n";
> exit;
> =======================================================================
> ============
>
> The output is Chrna7. My wish is to get the gi no that is 6671503.
>
> So how do i get that value.
>
>
> -siddhartha
>
>
>> -hilmar
>
>
>
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the Bioperl-l
mailing list