[Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl
Sean Davis
sdavis2 at mail.nih.gov
Mon Apr 16 15:55:14 UTC 2007
> > On 4/16/07, Wijaya Edward <ewijaya at i2r.a-star.edu.sg> wrote:
> > > Dear all,
> > >
> > > Given a GO id, is there a way to extract all
> > > the related gene names from that id with Perl?
This is a pretty simple problem if you have the data in a useable format. The
data that you need are available here:
ftp://ftp.ncbi.nih.gov/gene/DATA
The README file gives details, but the files in this directory are all
tab-delimited text. Download the gene2go.gz file, which contains a mapping
from Entrez Gene ID to GO accession. Then, download the gene_info.gz file,
which contains the information about the Entrez Gene ID, including
description, gene symbol, etc. If you need to link to other data, you can of
course download the respective files from NCBI. You can either load the data
into a SQL database of some type for general queries, or you can simply read
them into perl directly (with appropriate data structures) to do you mapping.
Since they are tab-delimited text, I would choose the database route and then
use SQL and DBI to do the queries you like.
Sean
More information about the Bioperl-l
mailing list