[Bioperl-l] getting proteins matching GO

Nathan (Nat) Goodman natg at shore.net
Sun Nov 7 02:12:21 EST 2004


Hi Pedro

> Pedro Antonio Reche wrote:
> Dear Stefan, thanks a lot  for your e-mail. Actually, I am interested 
> in getting all proteins from all organisms that are tagged with let say 
> the go_process cell signaling...

The tricky part of working with GO annotations is that they are arranged in
a hierarchical ontology.  When you talk about wanting proteins that are
tagged with a particular term, e.g., cell-cell signaling (GO:0007267), you
probably also want proteins tagged with terms subordinate to the given term.
There happen to be 93 such terms. I don't know if any of the sites mentioned
by Stephan have this information at hand, but I have produced a table which
I'm happy to share.  It has 168,071 rows.  If there are just a few terms
that you're interested in, like cell-cell signaling, I can do the query for
you and send you just that part of the table if that would be easier for
you.

The next step is to connect proteins to GO terms. I think the file you want
is gene_association.goa_uniprot.gz at
ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT/.  Perhaps other readers
can comment on whether there are better sources for the protein-GO
connections you need. It's a flatfile that's easy to parse.  A good way to
proceed is to load the data into a relational database and then join with
the GO defs from the paragraph above.  You can also do the processing in
Perl.

Good luck,
Nat
----------------------------------------------------------------------
Nathan (Nat) Goodman
Senior Research Scientist
Institute for Systems Biology
1441 North 34th Street
Seattle, WA 98103-8904
206-331-0077
206-363-0431 (fax)
natg at shore.net
http://home.comcast.net/~natgoodman/






More information about the Bioperl-l mailing list