[Bioperl-l] Locus Tag vs Accession number mappings
Chris Fields
cjfields at illinois.edu
Wed Oct 22 20:54:42 UTC 2008
On Oct 22, 2008, at 1:22 PM, Alberto Davila wrote:
> Dear colleagues,
>
> I wonder to know if there would be a way to use bioperl to generate
> a mapping of the NCBI Locus Tag ID (eg: MSMEG_2393, TA21330) to
> GenBank Accession Number (eg: AI568267,CR940347) or RefSeq accession
> number (eg: XM_949332.1) ?
>
> What would be the easiest way to do that ?
>
> I just asked NCBI-HelpDesk about this.
>
> Thanks, Alberto
For small lists (<500) you can query the nucleotide database directly
(you can add 'srcdb refseq[properties]' to the search term to limit
to just RefSeq):
-------------------------------------
use Bio::DB::EUtilities;
my @ids = qw(MSMEG_2393 TA21330);
my $term = join(' OR ',map {$_."[GENE]"} @ids);
my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch',
-db => 'nucleotide',
-term => $term);
my @uids = $eutil->get_ids;
$eutil->set_parameters(-eutil => 'esummary',-id => \@uids);
$eutil->print_DocSums;
-------------------------------------
You can 'epost' in increments if you have more IDs, up to 1000-2000 I
think. Beyond that, you should probably use one of the mapping files
located in the ftp.ncbi.nih.gov/gene/DATA folder and just use it
locally (initially index the data with DB_File, search using a tied
hash, etc).
chris
More information about the Bioperl-l
mailing list