[Bioperl-l] Bio::SeqIO; seq->desc() gives back too (!!!) full header
Benjamin Breu
breu@proteosys.com
Thu, 8 Aug 2002 10:01:10 +0200
Hi,
thx Jason for your help.
The desc() funktion prints out the header but there is too much stuff in it. I thought it would print only the description, but if there are multiple gi numbers for one protein (I'm using NCBI-Fasta (nr)), it shows me the description and the following gi, pir, etc. number plus their description. See below.
use Bio::SeqIO;
my $seq = Bio::SeqIO->new(-format => 'fasta', -file => 'filename'); #filename = my filename
while( my $seq = $in->next_seq ) {
print $seq->display_id(), "\n",$seq->desc(), "\n", $seq->seq(), "\n\n";
}
format as folows for output:
ID
description
sequence
gi|15233744|ref|NP_194152.1|
(NM_118554) putative protein [Arabidopsis thaliana]gi|7487330|pir||T09884 hypothetical protein T22A6.40 - Arabidopsis thalianagi|5051763|emb|CAB45056.1| (AL078637) putative protein [Arabidopsis thaliana]gi|7269271|emb|CAB79331.1| (AL161561) putative protein [Arabidopsis thaliana]
MKRSTTDSDLAGDAHNETNKKMKSTEEEEIGFSNLDENLVYEVLKHVDAKTLAMSSCVSKIWHKTAQDERLWELICTRHWTNIGCGQNQLRSVVLALGGFRRLHSLYLWPLSKPNPRARFGKDELKLTLSLLSIRYYEKMSFTKRPLPESK
Is there a problem with the parser or what options does it need in order to tell me the whole gi, pir, etc. -numbers when I call for an ID. That could be an hash with key = database (e.g. dbj, pir) and values = @arrayofnumbers. Is there such a smart little parser or do I have to spend (a lot of) hours to do this myself?
Thx
Ben