[Bioperl-l] Fwd: Question regarding Bio::GenBank module
Jason Stajich
jason at bioperl.org
Wed Aug 8 19:16:07 UTC 2007
Young -
I'm forwarding to the list for more help.
Begin forwarded message:
> From: "Young Song" <youngcsong at gmail.com>
> Date: August 8, 2007 1:48:29 PM CDT
> To: jason at bioperl.org
> Subject: Question regarding Bio::GenBank module
>
> Hello,
>
> I am currently located in Vancouver, Canada, and I actually have
> some
> question based on the Bio::GenBank module for bioperl. I read in the
> online document for the module (
> http://search.cpan.org/dist/bioperl/Bio/DB/GenBank.pm), that we are
> not
> supposed to spam the NCBI with multiple requests, which lead me to
> think
> about the script that I wrote. I am trying to extract some
> information
> based on the fasta protein files located in the NCBI's database.
> The
> script reads each '.faa' (Fasta Protein) file and takes in the
> 'gi' ID
> for each sequence, and extracts several information, which looks like
> following output (please note that there are lot more gi's then I
> am showing
> you right now):
>
> 10954456
> accesstion number: NP_047185.1
> dbsource: GenBank: NC_001911.1
> NP_047185.1
> starting pos. at genomic seq: 1488
> ending pos. at genomic seq: 1991
> strand: +
> description: putative membrane-associated protein
> organism: Buchnera aphidicola
> MERIIEKAIYASRWLMFPVYVGLSFGFILLTLKFFQQIVFIIPDILAMSESGLVLVVLSLIDIALVGGLL
> VMVMFLGYENFISKMDIQDNEKRLGWMGTMDVNSIKNKVASSIVAISSVHLLRLFMEAEKILDDKIMLCV
> IIHLTFVLSAFGMAYIDKMSKKKHVLH
> ************************************************
> 10954457
> accesstion number: NP_047186.1
> dbsource: GenBank: NC_001911.1
> NP_047186.1
> starting pos. at genomic seq: 2158
> ending pos. at genomic seq: 2913
> strand: +
> description: putative replication-associated protein
> organism: Buchnera aphidicola
> MPRKNYIYNPKPVFNPPKNKRKISTFICYAMKKASEIDVARSNLNYTLLLIDPKTGNILPRFRRLNEHRA
> CAMRAIVLAMLYYFDIHSNLVEASIEKLADECGLSTFSDSGNKSITRVSRLINDFLEPMGFVRCKKIKRK
> FVSNYIPKKIFLTPMFFMLFNISQSKINRYLFKSKKMSQNLKITEKKIFISFSDIKVMSRLDEKSIRKKI
> LNALINYYTASELTKIGPKGLKKRIDIEYNNLCKLFKKIKK
>
>
>
> Because there are lot of sequences I am dealing with here, I am
> little bit
> worried that I may be causing harm to the NCBI server. I just need
> to know
> if this is the right approach to take, or if there is another
> solution (I am
> little bit confused what you mean by "multiple requests" in the
> document).
> Your reply would be very much appreciated. Thank you in advance.
>
> Sincerely,
>
> Young C. Song
--
Jason Stajich
jason at bioperl.org
More information about the Bioperl-l
mailing list