[Bioperl-l] NCBI Genbank ID to TaxonID via Bioperl?

Jason Stajich jason.stajich at duke.edu
Tue Jun 21 12:02:30 EDT 2005


There is also a gi2taxonid file that you can download and index  
locally if you are going to do this a lot.  DB_File is useful for  
this as you can tie a hash to the file and re-use the index.

ftp://ftp.ncbi.nih.gov/pub/taxonomy/

AFAIK there is no direct NCBI utility to query with a gi and get the  
taxonid easily for every record - the download of the sequence record  
and then parsing is fine but will be slow if you have to do this over  
many many records.

The Bio::DB::Taxonomy modules are useful if you want to walk up and  
down the taxonomy hierarchy and get sub-sections and/or query for the  
least common node.

I use it in conjunction with the gi2taxid file (indexed) to identify  
DB search results by taxonomic groups.


-jason

On Jun 21, 2005, at 10:27 AM, michael watson ((IAH-C)) wrote:

> Bio::DB::Query::GenBank can be used to query GenBank, and  
> Bio::DB::GenBank can be used to retrieve records.
>
> After that it depends where the taxon id is stored - if it is  
> stored in the feature table, as in:
>
>                      /mol_type="mRNA"
>                      /cultivar="Nipponbare"
>                      /db_xref="taxon:39947"
>                      /clone="R2345"
>                      /dev_stage="seedling"
>
> Then once you have the Bio::Seq object from Bio::DB::GenBank you  
> can iterate through the feature table and look at each tag-value  
> pair (using the "has_tag" and "each_tag_value" methods) to look for  
> something like db_xref="taxon:39947"
>
> HTH
>
> Mick
>
>
> -----Original Message-----
> From:    bioperl-l-bounces at portal.open-bio.org on behalf of Michael  
> Spitzer
> Sent:    Tue 21/06/2005 2:27 PM
> To:    bioperl-l at bioperl.org
> Cc:
> Subject:    [Bioperl-l] NCBI Genbank ID to TaxonID via Bioperl?
> Dear All,
>
> For a list of approx. 20 GI numbers (NCBI GenBank IDs) I need the  
> taxon
> ID as given in the corresponding full GenBank record. Which is the
> easiest way to accomplish this task automatically? Does Bioperl help?
> Can one access this function via the NCBI website (possibly, using
> Bioperl)? Or, does one have to download the whole GenBank database?
>
> All I could find out is that there is a function 'gi2taxid' in the  
> NCBI
> toolkit, but I have no experience with using the toolkit, and I hope
> that there is an easier 'Bioperl' way to solve the problem - could
> BIO::DB::NCBIHelper be the way to go? Any help or hints are greatly
> appreciated!
>
> Kind regards,
>
> Michael Spitzer
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/




More information about the Bioperl-l mailing list