[Bioperl-l] how to retrieve organism name from accession number?

Chris Fields cjfields at illinois.edu
Sun Jan 10 20:49:40 UTC 2010


One could also use Bio::DB::Taxonomy, which indexes the same files or (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for the details).

chris

On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:

> An alternate non-BioPerly way (that may be faster given NCBI's flakiness lately) would be to download the gi_taxid_nucl.zip or gi_taxid_prot.zip files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash and do lookups. 
> In that same dir, taxdump.tar.gz contains a file called names.dmp which lists taxids and descriptions (and synonyms)
> 
> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I could do this:
> 
>   my $taxid  = $gi_taxid_nucl{$accession};
>   my $org_name = $names{$taxid};
> 
> --Russell
> 
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
>> Sent: Saturday, 26 December 2009 4:52 p.m.
>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
>> number?
>> 
>> Bhakti,
>> The following example (using EUtilities) may serve your purpose:
>> 
>> use Bio::DB::EUtilities;
>> 
>> my (%taxa, @taxa);
>> my (%names, %idmap);
>> 
>> # these are protein ids; nuc ids will work by changing -dbfrom =>
>> 'nucleotide',
>> # (probably)
>> 
>> my @ids = qw(1621261 89318838 68536103 20807972 730439);
>> 
>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
>>                                       -db => 'taxonomy',
>>                                       -dbfrom => 'protein',
>>                                       -correspondence => 1,
>>                                       -id => \@ids);
>> 
>> # iterate through the LinkSet objects
>> while (my $ds = $factory->next_LinkSet) {
>>    $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
>> }
>> 
>> @taxa = @taxa{@ids};
>> 
>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
>>        -db    => 'taxonomy',
>>        -id    => \@taxa );
>> 
>> while (local $_ = $factory->next_DocSum) {
>>    $names{($_->get_contents_by_name('TaxId'))[0]} =
>> ($_->get_contents_by_name('ScientificName'))[0];
>> }
>> 
>> foreach (@ids) {
>>    $idmap{$_} = $names{$taxa{$_}};
>> }
>> 
>> # %idmap is
>> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
>> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
>> #    68536103 => 'Corynebacterium jeikeium K411'
>> #    730439 => 'Bacillus caldolyticus'
>> #    89318838 => undef    (this record has been removed from the db)
>> 
>> 1;
>> 
>> You probably will need to break up your 30000 into chunks
>> (say, 1000-3000 each), and do the above on each chunk with a
>> 
>> sleep 3;
>> 
>> or so separating the queries.
>> MAJ
>> ----- Original Message -----
>> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Friday, December 25, 2009 9:46 PM
>> Subject: [Bioperl-l] how to retrieve organism name from accession number?
>> 
>> 
>>> Hi,
>>> 
>>> Does anyone know how to retrieve the "Source" or the "Species name"
>> given
>>> the accession number using Bioperl.   I have these 30,000 accession
>> numbers
>>> for which I need to get the source organisms.  Any kind of help will be
>>> appreciated.
>>> 
>>> Thanks
>>> 
>>> BD
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list