[Bioperl-microarray] Extraction of gene expression levels from NCBI GEO...

Allen Day allenday at ucla.edu
Fri Dec 19 14:29:41 EST 2003


Daniel,

>      I am new to this list, so bear with me if my question gets asked a lot. 

Welcome!

> Being newer to bioinformatics, there are still many things I'm only now 
> becoming familiar with. What I am trying to accomplish is to establish a 
> method of retrieving Microarray data from NCBI's GEO database, specifically 
> in connection with a particular gene. Once that data has been pulled in, it 
> needs to be formatted to create a chart/report of basic expression levels in 
> various tissue types. Projects like this have been done by Novartis at 
> http://expression.gnf.org, retrieving the data points associated with a 
> particular gene and then assembling the values into a java created graph. 
> While easy to read, the tissue expression values are what I desire most over 
> the graph.
>      The problem with the aforementioned database is that all data is based 
> on results from the Affymetrix U95A chip (human) and U74A chip (mouse) 
> chips, which do not have probes corresponding to the most recent genes and 
> ESTs. NCBI has data from the more recent U133A and B chips, as well as other 
> array formats, and therefore is more likely to have the data I'm looking at, 
> albeit the data being often derived from non-"normal" tissue. Initially, I 
> expect to have to download the individual files from NCBI manually, and from 
> that point parse the files with perl script and retrieve the expression 
> values for the gene of interest. From that point, I can assemble a report 
> combining the overlapping data points to create a proposed average 
> expression level of Gene X in Tissue Y. The end of the matter is this: what 
> modules are most suited to my purposes in Bioperl (if any) ? I have been 

There isn't anything.  I'm doing work very similar to what you describe.  
I take the approach of using bioperl-microarray to parse the files (from
NCBI or elsewhere) so that I can get at the expression data as objects.  
Then I load the data into a local chado database (http://www.gmod.org)  
using the RAD gene expression module and annotate the arrays using various
ontologies.

For the latter half it might make your life easier to try to obtain the
data in a MAGE-ML format, but I don't think it's always available from
public databases.  Most of the time I end up annotating w/ ontology terms
by hand.

> browsing the docs and have not seen any that seem to apply to what I am
> doing, but I can easily miss something as there are so many modules in
> the latest release. Any ideas or suggestions would be most appreciated.  
> Thanks!!!!!

-Allen



More information about the bioperl-microarray mailing list