[Biopython-dev] GEO library revamp

Peter Cock p.j.a.cock at googlemail.com
Wed May 16 17:07:18 UTC 2012

On Tue, May 15, 2012 at 5:44 PM, Erik Clarke <erikclarke at gmail.com> wrote:
> Hi all,
> I saw on the wiki that the BioPython GEO library was in need of some TLC.
> I agree; a recent effort to use the parser for a project in our lab was
> stymied by its lack of flexibility (it seems to be particularly poor at
> reading GEO datasets, for instance).
> In response, we've developed a basic GEO module in Python loosely based on
> GEOQuery and the existing Geo module. Currently, our module is capable of
> downloading and parsing all four major GEO record types and providing
> rudimentary pretty-print output of the data. It also provides a
> representation of a GDS file in a form amenable to statistical analysis
> using SciPy. I've included a method that finds the enriched genes in a given
> subset as a demonstration.
> Since it was an internal project before this, I would appreciate any
> feedback in terms of usability, bugs, etc that we may not have caught. It's
> still under active development as I flesh out some of the missing features
> (better pretty-printing, bug fixes, complete unit-test coverage, etc).
> In any case, my development branch of BioPython is here:
> https://github.com/eclarke/biopython/tree/GEOQuery, and obviously all of the
> new code is in the Bio/Geo folder (Records.py will replace Record.py). I've
> tried to make it as well-commented as possible. I have not yet tested it on
> Python < 2.7, but I plan on doing so.
> If this is of interest to anybody, I would be more than happy to tweak it
> as people saw fit and hopefully one day replace the current GEO parser.
> Cheers,
> Erik Clarke
> The Scripps Research Institute
> La Jolla, CA

Hi Erik,

That does sound promising. Switching to using numpy seems
very sensible :)

As you'll have read on the "Project Ideas" list on the wiki, I was
thinking we should draw inspiration from Sean Davis' GEOquery
in R/Bioconductor - which I had previously used from Python via
rpy http://www.warwick.ac.uk/go/peter_cock/r/geo/

Sean sometimes posts here on the Biopython lists, so it would
be great if he could comment on your work.


More information about the Biopython-dev mailing list