[Bioperl-l] getting proteins matching GO
Ian Donaldson
idonalds at blueprint.org
Fri Nov 12 17:09:24 EST 2004
Hi Pedro
One other solution that may be useful if you are interested in obtaining a
list of proteins in one numbering space (i.e. GenBank GI's):
SeqHound maps all of the GO Annotations from the GO ftp site to GI's using
multiple sequence database cross-reference files. I have attached details
below.
These data are available via a remote programming API (in Perl/Java/C/C++)
using the following calls
SHoundGiFromGOID
SHoundGiFromGOIDAndECode
SHoundGiFromGOIDList
SHoundGiFromGOIDListAndECode
SHoundGOECodeFromGiAndGOID
SHoundGOIDFromGi
SHoundGOIDFromGiList
SHoundGOIDFromLLID
SHoundGOIDFromRedundantGi
SHoundGOIDFromRedundantGiList
SHoundGOPMIDFromGiAndGOID
Details on the use of these functions is available at
http://www.blueprint.org/seqhound/seqhound_documentation.html
You can try out some of the http calls underneath the API calls to get a
feel for results before you use the API in a program.
Try
http://seqhound.blueprint.org/cgi-bin/seqrem?fnct=SeqHoundGiFromGOID&goid=50
778 for proteins involved in positive regulation of the immune response.
You can use the returned GI's to retrieve other data from SeqHound like
sequence, sequence neighbours and conserved domains.
There are also functions to help you traverse the GO tree like
SHoundGODBGetChildrenOf and SHoundGODBGetParentOf.
The compiled GO Annotation tables are also available from our ftp site in
MySQL or text format:
ftp://ftp.blueprint.org/pub/SeqHound/Data/goa/ and
ftp://ftp.blueprint.org/pub/SeqHound/Data/dbxref/
Details on these tables are available in the SeqHound manual
http://www.blueprint.org/seqhound/api_help/docs/The_SeqHound_Manual.pdf
These data were just updated as of November 11 by Renan Cavero in our group.
Best regards
Ian
************************
GOA Data Release V Notes
************************
No code changes from the previous data build.
Number of Records
DBXref: 11,866,278
Goa_gigo: 4,471,483
This release contains file:
DBXref files parsed from:
Uniprot 3.0
ftp://expasy.org/databases/uniprot/knowledgebase/uniprot_sprot.dat.gz
ftp://expasy.org/databases/uniprot/knowledgebase/uniprot_trembl.dat.gz
Locus Link Oct 29
ftp://ftp.ncbi.nih.gov/refseq/LocusLink/loc2ref
FlyBase.
ftp://ftp.geneontology.org/pub/go/gene-associations/gene_association.fb.gz
WormBase.
ftp://ftp.sanger.ac.uk/pub/databases/wormpep/wormpep.table
Mouse Genome Informatics.
ftp://ftp.informatics.jax.org/pub/reports/MRK_Sequence.rpt
ftp://ftp.informatics.jax.org/pub/reports/MRK_SwissProt_TrEMBL.rpt
Saccharomyces Genome Database.
ftp://genome-ftp.stanford.edu/pub/yeast/data_download/chromosomal_feature/db
xref.tab
Alliance for Cellular Signaling.
ftp://ftp.afcs.org/pub/mpdata/afcsflat.txt
The Institute for Genomic Research, Arabidopsis thaliana database.
ftp://ftp.tigr.org/pub/data/a_thaliana/ath1/DATA_RELEASE_SUPPLEMENT/release_
5.genbank_accessions.txt.gz
ftp://ftp.geneontology.org/pub/go/gp2protein/gp2protein.tigr_ath
DictyBase.
ftp://ftp.blueprint.org/pub/SeqHound/Private/DDB/dictybaseid_gb_accession.tx
t.gz
Rat Genome Database.
ftp://rgd.mcw.edu/pub/data_release/genbank_to_gene_ids.txt
The Zebrafish Information Network.
http://zfin.org/data_transfer/Downloads/genbank.txt
http://zfin.org/data_transfer/Downloads/refseq.txt
ftp://ftp.geneontology.org/pub/go/gp2protein/gp2protein.zfin
GeneDB_Spombe.
ftp://ftp.sanger.ac.uk/pub/yeast/pombe/Mappings/gp2swiss.txt
The Arabidopsis Information Resource.
ftp://ftp.geneontology.org/pub/go/gp2protein/gp2protein.tigr_cmr
The Institute for Genomic Research, Comprehensive Microbial Resource.
NCBI UniGene is an experimental system for automatically partitioning
GenBank
sequences.
ftp://ftp.geneontology.org/pub/go/gp2protein/gp2protein.unigene
Virus Database at University College London.
ftp://ftp.geneontology.org/pub/go/gp2protein/gp2protein.vida
IPI Cross Reference Files (human.xrefs, mouse.xrefs, rat.xrefs)
ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/HUMAN/human.xrefs.gz
ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/MOUSE/mouse.xrefs.gz
ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/RAT/rat.xrefs.gz
GOA files parsed from: ftp://ftp.geneontology.org/pub/go/gene-associations/
gene_association.Compugen_GenBank.gz
gene_association.Compugen_UniProt.gz
gene_association.GeneDB_Lmajor.gz
gene_association.GeneDB_Pfalciparum.gz
gene_association.GeneDB_Spombe.gz
gene_association.GeneDB_Tbrucei.gz
gene_association.GeneDB_tsetse.gz
gene_association.ddb.gz
gene_association.fb.gz
gene_association.goa_human.gz
gene_association.goa_mouse.gz
gene_association.goa_pdb.gz
gene_association.goa_rat.gz
gene_association.goa_uniprot.gz
gene_association.gramene_oryza.gz
gene_association.mgi.gz
gene_association.rgd.gz
gene_association.sgd.gz
gene_association.tair.gz
gene_association.tigr_Athaliana.gz
gene_association.tigr_Banthracis.gz
gene_association.tigr_Cburnetii.gz
gene_association.tigr_Gsulfurreducens.gz
gene_association.tigr_Lmonocytogenes.gz
gene_association.tigr_Psyringae.gz
gene_association.tigr_Soneidensis.gz
gene_association.tigr_Tbrucei_chr2.gz
gene_association.tigr_Vcholerae.gz
gene_association.tigr_gene_index.gz
gene_association.wb.gz
gene_association.zfin.gz
The following Gis were spot checked.
select a.* from goa_gigo a, seqhound.redund b, seqhound.redund c where
b.rgroup=
c.rgroup AND a.gi=b.gi AND c.gi IN (3641615, 17647231, 6321311, 6323259,
6322304, 6324701, 6324058, 10383763, 14318479, 6379287, 4758116, 28559088,
6678656, 9755336, 6323421, 30923724);
_______________________________________________
More information about the Bioperl-l
mailing list