[Bioperl-l] Extracting gi no from refseq record

Siddhartha Basu basu at pharm.sunysb.edu
Thu Apr 3 18:13:17 EST 2003


Hi hilmar,

Hilmar Lapp wrote:
> RefSeq does feature this line, at least the last time I checked. We also 
> do test for this being parsed correctly, namely in t/SeqIO.t. It's not 
> always as bad as it sometimes seems. I ran an entire download of RefSeq 
> a couple weeks ago and the GI# was parsed out of every single record.
> 
> Siddharta, do you run at least bioperl 1.2? 
Yes i am running bioperl 1.2

If you do, can you run
> 
>     $ make test_SeqIO
> 
> from the root directory of the bioperl distribution? Can you please mail 
> the output? 

No tests failed it seems. Here is the output.
==========================================================================
PERL_DL_NONLAZY=1 /usr/bin/perl -Iblib/arch -Iblib/lib 
-I/usr/lib/perl5/5.8.0/i386-linux-thread-multi -I/usr/lib/perl5/5.8.0 -e 
'use Test::Harness qw(&runtests $verbose); $verbose=0; runtests @ARGV;' 
t/SeqIO.t
t/SeqIO....ok
         3/146 skipped:
All tests successful, 3 subtests skipped.
Files=1, Tests=146,  1 wallclock secs ( 0.88 cusr +  0.03 csys =  0.91 CPU)
===========================================================================

If a test fails, could you please run
> 
>     $ make test_SeqIO TEST_VERBOSE=1
> 
> and mail the output for the failed test(s)?
> 
> If no test fails, could you email the accession# of the sequence you 
> retrieved and for which there was no GI#?

The accession no i have tried is NP_031416.
This is a mouse refseq NP no. which i have choosen randomly from mouse 
refseq flat files. I have downloaded the refseq flat files that ends 
with gbff and gnp and then used the following code to index all of them.

=============================================================================
#!/usr/bin/perl -w

use strict;
use Bio::Index::GenBank;


my $IdFile = "$ENV{BIOPERL_INDEX}/genebankall";


my $Idx = Bio::Index::GenBank->new( -filename => $IdFile, -write_flag=> 
'WRITE');

my $Folder = "/home/basu/scriptanalysis/Genepept";

opendir(FL,$Folder) || die "Can't open folder:$!";

my @Files = grep (!/^\.\.?$/,readdir(FL));


my @FullPath = ();

foreach (@Files) { push (@FullPath, "$Folder/$_"); }

$Idx->make_index(@FullPath);

print "Indexed successfully\n";

closedir(FL);

exit;
=================================================================================

Then used a simple script for fetching ......

==================================================================================
#!/usr/bin/perl -w
#

use Bio::Index::GenBank;

use strict;

my $GenIndName = "$ENV{BIOPERL_INDEX}/genebankall";

my $Idx = Bio::Index::GenBank->new(-filename => $GenIndName);

my $Seq = $Idx->get_Seq_by_acc('NP_031416');



print $Seq->primary_id(),"\n";
exit;
===================================================================================

The output is Chrna7. My wish is to get the gi no that is 6671503.

So how do i get that value.


-siddhartha


> 
>     -hilmar
> 




More information about the Bioperl-l mailing list