[Bioperl-l] Fwd: Question regarding Bio::GenBank module

Wed Aug 8 19:16:07 UTC 2007

Young -
I'm forwarding to the list for more help.

Begin forwarded message:

> From: "Young Song" <youngcsong at gmail.com>
> Date: August 8, 2007 1:48:29 PM CDT
> To: jason at bioperl.org
> Subject: Question regarding Bio::GenBank module
>
> Hello,
>
>    I am currently located in Vancouver, Canada, and I actually have  
> some
> question based on the Bio::GenBank module for bioperl.  I read in the
> online document for the module (
> http://search.cpan.org/dist/bioperl/Bio/DB/GenBank.pm), that we are  
> not
> supposed to spam the NCBI with multiple requests, which lead me to  
> think
> about the script that I wrote.  I am trying to extract some  
> information
> based on the fasta protein files located in the  NCBI's  database.   
> The
> script  reads  each '.faa' (Fasta Protein) file and takes in the  
> 'gi'  ID
> for each  sequence, and extracts several information, which looks like
> following output (please note that there are lot more gi's then I  
> am showing
> you right now):
>
> 10954456
> accesstion number: NP_047185.1
> dbsource: GenBank: NC_001911.1
> NP_047185.1
> starting pos. at genomic seq: 1488
> ending pos. at genomic seq: 1991
> strand: +
> description: putative membrane-associated protein
> organism: Buchnera aphidicola
> MERIIEKAIYASRWLMFPVYVGLSFGFILLTLKFFQQIVFIIPDILAMSESGLVLVVLSLIDIALVGGLL 
> VMVMFLGYENFISKMDIQDNEKRLGWMGTMDVNSIKNKVASSIVAISSVHLLRLFMEAEKILDDKIMLCV 
> IIHLTFVLSAFGMAYIDKMSKKKHVLH
> ************************************************
> 10954457
> accesstion number: NP_047186.1
> dbsource: GenBank: NC_001911.1
> NP_047186.1
> starting pos. at genomic seq: 2158
> ending pos. at genomic seq: 2913
> strand: +
> description: putative replication-associated protein
> organism: Buchnera aphidicola
> MPRKNYIYNPKPVFNPPKNKRKISTFICYAMKKASEIDVARSNLNYTLLLIDPKTGNILPRFRRLNEHRA 
> CAMRAIVLAMLYYFDIHSNLVEASIEKLADECGLSTFSDSGNKSITRVSRLINDFLEPMGFVRCKKIKRK 
> FVSNYIPKKIFLTPMFFMLFNISQSKINRYLFKSKKMSQNLKITEKKIFISFSDIKVMSRLDEKSIRKKI 
> LNALINYYTASELTKIGPKGLKKRIDIEYNNLCKLFKKIKK
>
>
>
>   Because there are lot of sequences I am dealing with here, I am  
> little bit
> worried that I may be causing harm to the NCBI server.  I just need  
> to know
> if this is the right approach to take, or if there is another  
> solution (I am
> little bit confused what you mean by "multiple requests" in the  
> document).
> Your reply would be very much appreciated.  Thank you in advance.
>
>   Sincerely,
>
>      Young C. Song

--
Jason Stajich
jason at bioperl.org