[Bioperl-l] Species retrieval from NCBI nr protein database.

Navdeep Jaitly ndjaitly@hotmail.com
Mon, 29 Jul 2002 10:22:59 -0400


Hi!
I was using SeqIO to get proteins in NCBI nr database. Unfortunately it 
seems that the parsing of the species field is not quite working, and it 
gets lumped in with the description field (usually the species is the last 
element in the header of the nr database and is surrounded by []). Is this 
to be expected or am I doing something wrong ? Is the parsing of the fields 
specifiable in declaring a SeqIO instance ?
Thanks!
Deep

ps: Code, and results attached.



use Bio::SeqIO;
use strict ;
$in  = Bio::SeqIO->new('-file' => "c:\\Databases\\nr.fas",
                         '-format' => 'Fasta');
my $TO_PRINT = 3 ;
my $numProteins = 0 ;
my $seq ;
while ( ($seq = $in->next_seq()) && $numProteins < $TO_PRINT)
{
	my $sequence = $seq->seq() ;
	my $name = $seq->display_id() ;
	my $species = $seq->species() ;
	my $description = $seq->desc() ;
	print "NAME: $name\n" ;
	print "SPECIES: $species\n" ;
	print "DESCRIPTION: $description\n" ;
	print "SEQUENCE: $sequence\n\n" ;
	$numProteins++ ;
}


NAME: gi|6|emb|CAA42669.1|
SPECIES:
DESCRIPTION: (X60065) beta-2-glycoprotein  I [Bos taurus]
SEQUENCE: 
PALVLLLGFLCHVAIAGRTCPKPDELPFSTVVPLKRTYEPGEQIVFSCQPGYVSRGGIRRFTCPLTGLWPINTLKCMPRVCPFAGILENGTVRYTTFEYPNTISFSCHTGFYLKGASSAKCTEEGKWSPDLPVCAPITCPPPPIPKFASLSVYKPLAGNNSFYGSKAVFKCLPHHAMFGNDTVTCTEHGNWTQLPECREVRCPFPSRPDNGFVNHPANPVLYYKDTATFGCHETYSLDGPEEVECSKFGNWSAQPSCKASCKLSIKRATVIYEGERVAIQNKFKNGMLHGQKVSFFCKHKEKKCSYTEDAQCIDGTIEIPKCFKEHSSLAFWKTDASDVKPC

NAME: gi|129249|sp|P02820|OSTC_BOVIN
SPECIES:
DESCRIPTION: OSTEOCALCIN PRECURSOR (GAMMA-CARBOXYGLUTAMIC ACID-CONTAINING 
PROTEIN) (BONE GLA-PROTEIN) (BGP)gi|538590|pir||GEBO osteocalcin precursor 
- bovinegi|8|emb|CAA35997.1| (X51700) bone Gla precursor (100 AA) [Bos 
taurus]gi|720|emb|CAA37737.1| (X53699) Gla protein precusor [Bos taurus]
SEQUENCE: 
MRTPMLLALLALATLCLAGRADAKPGDAESGKGAAFVSKQEGSEVVKRLRRYLDHWLGAPAPYPDPLEPKREVCELNPDCDELADHIGFQEAYRRFYGPV

NAME: gi|231734|sp|P30274|CGA2_BOVIN
SPECIES:
DESCRIPTION: CYCLIN A2 (CYCLIN A)gi|284597|pir||S24788 cyclin A - 
bovinegi|10|emb|CAA48398.1| (X68321) Cyclin A-3 [Bos taurus]
SEQUENCE: 
EFQEDQENVNPEKAAPAQQPRTRAGLAVLRAGNSRGPAPQRPKTRRVAPLKDLPINDEYVPVPPWKANNKQPAFTIHVDEAEEIQKRPTESKKSESEDVLAFNSAVTLPGPRKPLAPLDYPMDGSFESPHTMEMSVVLEDEKPVSVNEVPDYHEDIHTYLREMEVKCKPKVGYMKKQPDITNSMRAILVDWLVEVGEEYKLQNETLHLAVNYIDRFLSSMSVLRGKLQLVGTAAMLLASKFEEIYPPEVAEFVYITDDTYTKKQVLRMEHLVLKVLAFDLAAPTINQFLTQYFLHQQPANCKVESLAMFLGELSLIDADPYLKYLPSVIAAAAFHLALYTVTGQSWPESLVQKTGYTLETLKPCLLDLHQTYLRAPQHAQQSIREKYKNSKYHGVSLLNPPETLNV


_________________________________________________________________
Join the world’s largest e-mail service with MSN Hotmail. 
http://www.hotmail.com