[Bioperl-l] Newbie Questions: bioperl, bioperl-db, and GO

Sean Davis sdavis2 at mail.nih.gov
Thu Apr 14 02:55:56 EDT 2005


On Apr 14, 2005, at 12:26 AM, jjmail at mac.com wrote:

> Question 1:
>
> I am brand new to bioperl and the related projects so please forgive 
> my ignorance on this. I have a large list of protein names and I would 
> like to use bioperl to get the corresponding Gene Ontology (GO) 
> information for each protein.
>
> So far I have installed bioperl, BioSQL, and bioperl-db and uploaded 
> the taxonomy and GO information into BioSQL. I am having a really hard 
> time figuring out how to get the GO information out of the database. 
> If anyone knows the right doc to read or has a simple example program 
> that I could see that would be really useful.
>

I see that Hilmar took a stab at answering your question on the details 
of GO and BioSQL.

> Question 2:
>
> I have collected protein expression data for various states and I 
> would like to cluster the data based on GO information for a start and 
> then if possible use bioperl's ability to analyze mRNA array data to 
> analyze the protein data. Does this seem reasonable? Where should I 
> start looking to figure out how to do this?
>

This may reflect a bit of my own bias, but if you are looking at 
expression (as in arrays, etc.), then I think the better tool to spend 
time with is called BioConductor.  It is a collection of tools written 
for the R programming language (which you can install).  Using 
bioconductor, you can use the annotation building package (AnnBuilder) 
to make an annotation package for all of the genes in your experiment.  
The annotation package you create contains the GO information, biologic 
pathways, chromosome locations, etc.  Then you can use any one of 
dozens of normalization and analysis or clustering methods to cluster 
based on whatever you like, including some GO-based clustering.

Perl is just not the most natural tool for doing high-level, vectorized 
math.  BioConductor is built just for exploring data like array data 
(or other high-throughput data).

Check out the site (http://www.bioconductor.org).  There is also an 
email list for bioconductor.

Sean



More information about the Bioperl-l mailing list