[Bioperl-l] Retrieving Gene Info from NCBI

Brian Osborne osborne1 at optonline.net
Mon Sep 11 19:49:22 UTC 2006


Ryan,

I'm not completely sure I understand the question but I will try to answer.
Yes, you can retrieve genes from Entrez Gene as objects in at least 3 ways.
One is to use Bio::DB::EntrezGene, but the problem here is you need to know
the "Gene id", you can't use something like "LOC490757". This is due to a
limitation of the Entrez Gene API, I assume this limitation is still in
effect. As you know there are files available at NCBI that map accessions
and identifiers to Gene ids.

Or, there's the Ensembl API. I'd expect that you could query this API with
your accession successfully but I haven't used this API much except to know
that it's quite powerful. Take a look at this FAQ question:

http://www.bioperl.org/wiki/Getting_Genomic_Sequences

Or, you can download Entrez Genes ASN file and use SeqIO.

If you choose pure Bioperl over Ensembl you'll see that there's quite a bit
of information in these Sequence objects from Entrez Gene, you need to do a
bit of studying to find out where the desired data is.

Brian O.


On 9/11/06 1:40 PM, "Ryan Golhar" <golharam at umdnj.edu> wrote:

> I'm not sure this is possible but I'll ask anyway:
> 
> NCBI contains the Genomic region information in the Gene database for
> every known gene.  For instance, if you search NCBI for XM_547879, there
> is 1 entry in "Gene".  Follow that entry and it takes you to
> LOC490757...follow that and it takes you to the Gene db entry.  Under
> genomic regions, etc, it shows the exons, intron, and UTRs.  How can I
> extract this information from Entrez Gene?  Is it possible with Bioperl?
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list