[Bioperl-l] retrieve refseq ids from UIDs

Smithies, Russell Russell.Smithies at agresearch.co.nz
Tue Jun 28 21:26:34 UTC 2011


Gene and tax data are all tab-separated so it's just a matter of splitting into a hash then querying.
There's a readme in each file (I think) that describes what goes where.

--Russell

> -----Original Message-----
> From: carandraug at gmail.com [mailto:carandraug at gmail.com] On Behalf Of
> Carnë Draug
> Sent: Wednesday, 29 June 2011 9:24 a.m.
> To: Smithies, Russell
> Cc: bioperl mailing list
> Subject: Re: [Bioperl-l] retrieve refseq ids from UIDs
> 
> 2011/6/28 Smithies, Russell <Russell.Smithies at agresearch.co.nz>:
> > It's fairly common for NCBI to return partial or incomplete data,
> often 1/2 a record is missing or requests will time-out at random.
> > If you have a lot of records, it may be better to download all the
> data from the ftp site then parse it locally. This is what we tend to
> do if there's more than a few hundred queries. I'd like to point out
> that it's NCBIs problem, not the BioPerl code at fault. You'll run into
> the same problems if you use NCBIs Perl API
> (http://www.ncbi.nlm.nih.gov/books/NBK1058/) directly.
> 
> Is there any way to catch this kind of errors? Other than repeat
> fetching the data until there's two consecutive results that have the
> same result?
> 
> > Take a look at the gene2accession, gene2refseq, and gene_info data at
> ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ and at the tax data
> ftp://ftp.ncbi.nih.gov/pub/taxonomy/ if you need to decode the taxids
> without doing web queries.
> > It's much easier/faster to download these files, index them, them
> search rather than do queries against NCBI.
> 
> Any module already done written to parse these guys?
> 
> Thanks for all your answers,
> Carnë
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================




More information about the Bioperl-l mailing list