[Bioperl-l] getting proteins matching GO
Pedro Antonio Reche
reche at research.dfci.harvard.edu
Mon Nov 8 08:24:16 EST 2004
Dear Nathan, thanks a lot for your help. As you mention I wish to
collect all proteins subordinate to a given term. There are several
terms I am interested in in retrieving the proteins (all related with
the immune system) which I have not defined entirely. Therefore, I
guess that it will be easier if you could send me the file you
indicated. I have just retrieve the gene_association.goa_uniprot.gz.
Thanks for the tip.
I am looking forward to hearing from you.
Best,
pdro
> Hi Pedro
>
>> Pedro Antonio Reche wrote:
>> Dear Stefan, thanks a lot for your e-mail. Actually, I am interested
>> in getting all proteins from all organisms that are tagged with let
>> say
>> the go_process cell signaling...
>
> The tricky part of working with GO annotations is that they are
> arranged in
> a hierarchical ontology. When you talk about wanting proteins that are
> tagged with a particular term, e.g., cell-cell signaling (GO:0007267),
> you
> probably also want proteins tagged with terms subordinate to the given
> term.
> There happen to be 93 such terms. I don't know if any of the sites
> mentioned
> by Stephan have this information at hand, but I have produced a table
> which
> I'm happy to share. It has 168,071 rows. If there are just a few
> terms
> that you're interested in, like cell-cell signaling, I can do the
> query for
> you and send you just that part of the table if that would be easier
> for
> you.
>
> The next step is to connect proteins to GO terms. I think the file you
> want
> is gene_association.goa_uniprot.gz at
> ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT/. Perhaps other
> readers
> can comment on whether there are better sources for the protein-GO
> connections you need. It's a flatfile that's easy to parse. A good
> way to
> proceed is to load the data into a relational database and then join
> with
> the GO defs from the paragraph above. You can also do the processing
> in
> Perl.
>
> Good luck,
> Nat
> ----------------------------------------------------------------------
> Nathan (Nat) Goodman
> Senior Research Scientist
> Institute for Systems Biology
> 1441 North 34th Street
> Seattle, WA 98103-8904
> 206-331-0077
> 206-363-0431 (fax)
> natg at shore.net
> http://home.comcast.net/~natgoodman/
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list