[Bioperl-l] Locus Tag vs Accession number mappings

Chris Fields cjfields at illinois.edu
Wed Oct 22 20:54:42 UTC 2008


On Oct 22, 2008, at 1:22 PM, Alberto Davila wrote:

> Dear colleagues,
>
> I wonder to know if there would be a way to use bioperl to generate  
> a mapping of the NCBI Locus Tag ID (eg: MSMEG_2393, TA21330) to  
> GenBank Accession Number (eg: AI568267,CR940347) or RefSeq accession  
> number (eg:  XM_949332.1) ?
>
> What would be the easiest way to do that ?
>
> I just asked NCBI-HelpDesk about this.
>
> Thanks, Alberto

For small lists (<500) you can query the nucleotide database directly  
(you can add 'srcdb refseq[properties]'  to the search term to limit  
to just RefSeq):
-------------------------------------
use Bio::DB::EUtilities;

my @ids = qw(MSMEG_2393 TA21330);

my $term = join(' OR ',map {$_."[GENE]"} @ids);

my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch',
                                      -db => 'nucleotide',
                                      -term => $term);

my @uids = $eutil->get_ids;

$eutil->set_parameters(-eutil => 'esummary',-id => \@uids);

$eutil->print_DocSums;
-------------------------------------

You can 'epost' in increments if you have more IDs, up to 1000-2000 I  
think.  Beyond that, you should probably use one of the mapping files  
located in the ftp.ncbi.nih.gov/gene/DATA folder and just use it  
locally (initially index the data with DB_File, search using a tied  
hash, etc).

chris



More information about the Bioperl-l mailing list