[Bioperl-l] Species retrieval from NCBI nr protein database.
Navdeep Jaitly
ndjaitly@hotmail.com
Mon, 29 Jul 2002 10:22:59 -0400
Hi!
I was using SeqIO to get proteins in NCBI nr database. Unfortunately it
seems that the parsing of the species field is not quite working, and it
gets lumped in with the description field (usually the species is the last
element in the header of the nr database and is surrounded by []). Is this
to be expected or am I doing something wrong ? Is the parsing of the fields
specifiable in declaring a SeqIO instance ?
Thanks!
Deep
ps: Code, and results attached.
use Bio::SeqIO;
use strict ;
$in = Bio::SeqIO->new('-file' => "c:\\Databases\\nr.fas",
'-format' => 'Fasta');
my $TO_PRINT = 3 ;
my $numProteins = 0 ;
my $seq ;
while ( ($seq = $in->next_seq()) && $numProteins < $TO_PRINT)
{
my $sequence = $seq->seq() ;
my $name = $seq->display_id() ;
my $species = $seq->species() ;
my $description = $seq->desc() ;
print "NAME: $name\n" ;
print "SPECIES: $species\n" ;
print "DESCRIPTION: $description\n" ;
print "SEQUENCE: $sequence\n\n" ;
$numProteins++ ;
}
NAME: gi|6|emb|CAA42669.1|
SPECIES:
DESCRIPTION: (X60065) beta-2-glycoprotein I [Bos taurus]
SEQUENCE:
PALVLLLGFLCHVAIAGRTCPKPDELPFSTVVPLKRTYEPGEQIVFSCQPGYVSRGGIRRFTCPLTGLWPINTLKCMPRVCPFAGILENGTVRYTTFEYPNTISFSCHTGFYLKGASSAKCTEEGKWSPDLPVCAPITCPPPPIPKFASLSVYKPLAGNNSFYGSKAVFKCLPHHAMFGNDTVTCTEHGNWTQLPECREVRCPFPSRPDNGFVNHPANPVLYYKDTATFGCHETYSLDGPEEVECSKFGNWSAQPSCKASCKLSIKRATVIYEGERVAIQNKFKNGMLHGQKVSFFCKHKEKKCSYTEDAQCIDGTIEIPKCFKEHSSLAFWKTDASDVKPC
NAME: gi|129249|sp|P02820|OSTC_BOVIN
SPECIES:
DESCRIPTION: OSTEOCALCIN PRECURSOR (GAMMA-CARBOXYGLUTAMIC ACID-CONTAINING
PROTEIN) (BONE GLA-PROTEIN) (BGP)gi|538590|pir||GEBO osteocalcin precursor
- bovinegi|8|emb|CAA35997.1| (X51700) bone Gla precursor (100 AA) [Bos
taurus]gi|720|emb|CAA37737.1| (X53699) Gla protein precusor [Bos taurus]
SEQUENCE:
MRTPMLLALLALATLCLAGRADAKPGDAESGKGAAFVSKQEGSEVVKRLRRYLDHWLGAPAPYPDPLEPKREVCELNPDCDELADHIGFQEAYRRFYGPV
NAME: gi|231734|sp|P30274|CGA2_BOVIN
SPECIES:
DESCRIPTION: CYCLIN A2 (CYCLIN A)gi|284597|pir||S24788 cyclin A -
bovinegi|10|emb|CAA48398.1| (X68321) Cyclin A-3 [Bos taurus]
SEQUENCE:
EFQEDQENVNPEKAAPAQQPRTRAGLAVLRAGNSRGPAPQRPKTRRVAPLKDLPINDEYVPVPPWKANNKQPAFTIHVDEAEEIQKRPTESKKSESEDVLAFNSAVTLPGPRKPLAPLDYPMDGSFESPHTMEMSVVLEDEKPVSVNEVPDYHEDIHTYLREMEVKCKPKVGYMKKQPDITNSMRAILVDWLVEVGEEYKLQNETLHLAVNYIDRFLSSMSVLRGKLQLVGTAAMLLASKFEEIYPPEVAEFVYITDDTYTKKQVLRMEHLVLKVLAFDLAAPTINQFLTQYFLHQQPANCKVESLAMFLGELSLIDADPYLKYLPSVIAAAAFHLALYTVTGQSWPESLVQKTGYTLETLKPCLLDLHQTYLRAPQHAQQSIREKYKNSKYHGVSLLNPPETLNV
_________________________________________________________________
Join the world’s largest e-mail service with MSN Hotmail.
http://www.hotmail.com