[Biopython-dev] GEO library revamp

Erik Clarke erikclarke at gmail.com
Wed May 16 18:51:45 UTC 2012

Thanks Sean, I'll definitely have a look at those.  I'm looking forward to hearing your thoughts or critiques of the implementation.


On May 16, 2012, at 10:20 AM, Sean Davis wrote:

> On Wed, May 16, 2012 at 1:07 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Tue, May 15, 2012 at 5:44 PM, Erik Clarke <erikclarke at gmail.com> wrote:
> > Hi all,
> > I saw on the wiki that the BioPython GEO library was in need of some TLC.
> > I agree; a recent effort to use the parser for a project in our lab was
> > stymied by its lack of flexibility (it seems to be particularly poor at
> > reading GEO datasets, for instance).
> >
> > In response, we've developed a basic GEO module in Python loosely based on
> > GEOQuery and the existing Geo module. Currently, our module is capable of
> > downloading and parsing all four major GEO record types and providing
> > rudimentary pretty-print output of the data. It also provides a
> > representation of a GDS file in a form amenable to statistical analysis
> > using SciPy. I've included a method that finds the enriched genes in a given
> > subset as a demonstration.
> >
> > Since it was an internal project before this, I would appreciate any
> > feedback in terms of usability, bugs, etc that we may not have caught. It's
> > still under active development as I flesh out some of the missing features
> > (better pretty-printing, bug fixes, complete unit-test coverage, etc).
> >
> > In any case, my development branch of BioPython is here:
> > https://github.com/eclarke/biopython/tree/GEOQuery, and obviously all of the
> > new code is in the Bio/Geo folder (Records.py will replace Record.py). I've
> > tried to make it as well-commented as possible. I have not yet tested it on
> > Python < 2.7, but I plan on doing so.
> >
> > If this is of interest to anybody, I would be more than happy to tweak it
> > as people saw fit and hopefully one day replace the current GEO parser.
> >
> > Cheers,
> > Erik Clarke
> > The Scripps Research Institute
> > La Jolla, CA
> Hi Erik,
> That does sound promising. Switching to using numpy seems
> very sensible :)
> As you'll have read on the "Project Ideas" list on the wiki, I was
> thinking we should draw inspiration from Sean Davis' GEOquery
> http://www.bioconductor.org/packages/bioc/html/GEOquery.html
> in R/Bioconductor - which I had previously used from Python via
> rpy http://www.warwick.ac.uk/go/peter_cock/r/geo/
> Sean sometimes posts here on the Biopython lists, so it would
> be great if he could comment on your work.
> I'm looking forward to taking a look.  It will be great to have a native python implementation.  In the short term, Erik, you might take a look at some of the tests in the GEOquery package for some (high level) edge cases that I have stumbled onto over the years.  
> Sean

More information about the Biopython-dev mailing list