[Biopython] how to obtain official Gene Symbols for a list of GeneNames

Peter Cock p.j.a.cock at googlemail.com
Fri Jan 8 16:48:40 UTC 2010


Please CC the mailing list.

On Fri, Jan 8, 2010 at 4:09 PM, Sameet Mehta <msameet at gmail.com> wrote:
> Hi,
> My list contains gene names such as DKFZP586P0123 , RPL6, etc.  What I
> do is search this in the NCBI Gene database manually, and then i get
> the official Gene Symbol.  I want to automate this process.  I am of
> course interested only in official gene symbols from the Humans.
>
> Sameet

OK, so via my browser using Entrez Gene, I used:

DKFZP586P0123 "Homo sapiens"[orgn]

This maps uniquely to C2CD3. However,

RPL6 "Homo sapiens"[orgn]

maps to several hits (some discontinued) included things like
RPL6P13. Clearly we need to make the search a little more
specific... we only want to search for a name or gene symbol
(not the default search on all fields).

It looks like searching on "gene" works nicely, see also:
http://news.open-bio.org/news/2009/06/ncbi-einfo-biopython/

Entrez queries like these seem to give unique matches:

DKFZP586P0123[gene] "Homo sapiens"[orgn]
RPL6[gene] "Homo sapiens"[orgn]

e.g.

>>> from Bio import Entrez
>>> Entrez.email = "Your.Name.Here at example.com"
>>> search = Entrez.read(Entrez.esearch(db='gene', term='DKFZP586P0123[gene] "Homo sapiens"[orgn]', retmode='xml'))
>>> print search["IdList"]
['26005']

That unique ID we got back (26005) is the UID for this gene, which
you should be able to use with EFetch (or Elink?). e.g. You could
download the whole record as XML, and parse that:

>>> result = Entrez.read(Entrez.efetch(db='gene', id='26005', retmode='xml'))
>>> result[0]['Entrezgene_gene']['Gene-ref']['Gene-ref_locus']
'C2CD3'

However, this next approach is a much quicker download, and so
looks like a more efficient way to get the desired gene symbol:

>>> print Entrez.efetch(db='gene', id='26005', retmode='text', rettype='brief').read()

1: C2CD3 C2 calcium-depend... [GeneID: 26005]

Next read the Entrez chapter in the Biopython Tutorial, especially
the bit about the history functionality for linking ESearch and EFetch.

Peter




More information about the Biopython mailing list