[Bioperl-l] NCBI Genbank ID to TaxonID via Bioperl?
Jason Stajich
jason.stajich at duke.edu
Tue Jun 21 12:02:30 EDT 2005
There is also a gi2taxonid file that you can download and index
locally if you are going to do this a lot. DB_File is useful for
this as you can tie a hash to the file and re-use the index.
ftp://ftp.ncbi.nih.gov/pub/taxonomy/
AFAIK there is no direct NCBI utility to query with a gi and get the
taxonid easily for every record - the download of the sequence record
and then parsing is fine but will be slow if you have to do this over
many many records.
The Bio::DB::Taxonomy modules are useful if you want to walk up and
down the taxonomy hierarchy and get sub-sections and/or query for the
least common node.
I use it in conjunction with the gi2taxid file (indexed) to identify
DB search results by taxonomic groups.
-jason
On Jun 21, 2005, at 10:27 AM, michael watson ((IAH-C)) wrote:
> Bio::DB::Query::GenBank can be used to query GenBank, and
> Bio::DB::GenBank can be used to retrieve records.
>
> After that it depends where the taxon id is stored - if it is
> stored in the feature table, as in:
>
> /mol_type="mRNA"
> /cultivar="Nipponbare"
> /db_xref="taxon:39947"
> /clone="R2345"
> /dev_stage="seedling"
>
> Then once you have the Bio::Seq object from Bio::DB::GenBank you
> can iterate through the feature table and look at each tag-value
> pair (using the "has_tag" and "each_tag_value" methods) to look for
> something like db_xref="taxon:39947"
>
> HTH
>
> Mick
>
>
> -----Original Message-----
> From: bioperl-l-bounces at portal.open-bio.org on behalf of Michael
> Spitzer
> Sent: Tue 21/06/2005 2:27 PM
> To: bioperl-l at bioperl.org
> Cc:
> Subject: [Bioperl-l] NCBI Genbank ID to TaxonID via Bioperl?
> Dear All,
>
> For a list of approx. 20 GI numbers (NCBI GenBank IDs) I need the
> taxon
> ID as given in the corresponding full GenBank record. Which is the
> easiest way to accomplish this task automatically? Does Bioperl help?
> Can one access this function via the NCBI website (possibly, using
> Bioperl)? Or, does one have to download the whole GenBank database?
>
> All I could find out is that there is a function 'gi2taxid' in the
> NCBI
> toolkit, but I have no experience with using the toolkit, and I hope
> that there is an easier 'Bioperl' way to solve the problem - could
> BIO::DB::NCBIHelper be the way to go? Any help or hints are greatly
> appreciated!
>
> Kind regards,
>
> Michael Spitzer
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/
More information about the Bioperl-l
mailing list