[Biopython-dev] GEO library revamp

Sean Davis sdavis2 at mail.nih.gov
Wed May 16 17:20:11 UTC 2012


On Wed, May 16, 2012 at 1:07 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Tue, May 15, 2012 at 5:44 PM, Erik Clarke <erikclarke at gmail.com> wrote:
> > Hi all,
> > I saw on the wiki that the BioPython GEO library was in need of some TLC.
> > I agree; a recent effort to use the parser for a project in our lab was
> > stymied by its lack of flexibility (it seems to be particularly poor at
> > reading GEO datasets, for instance).
> >
> > In response, we've developed a basic GEO module in Python loosely based
> on
> > GEOQuery and the existing Geo module. Currently, our module is capable of
> > downloading and parsing all four major GEO record types and providing
> > rudimentary pretty-print output of the data. It also provides a
> > representation of a GDS file in a form amenable to statistical analysis
> > using SciPy. I've included a method that finds the enriched genes in a
> given
> > subset as a demonstration.
> >
> > Since it was an internal project before this, I would appreciate any
> > feedback in terms of usability, bugs, etc that we may not have caught.
> It's
> > still under active development as I flesh out some of the missing
> features
> > (better pretty-printing, bug fixes, complete unit-test coverage, etc).
> >
> > In any case, my development branch of BioPython is here:
> > https://github.com/eclarke/biopython/tree/GEOQuery, and obviously all
> of the
> > new code is in the Bio/Geo folder (Records.py will replace Record.py).
> I've
> > tried to make it as well-commented as possible. I have not yet tested it
> on
> > Python < 2.7, but I plan on doing so.
> >
> > If this is of interest to anybody, I would be more than happy to tweak it
> > as people saw fit and hopefully one day replace the current GEO parser.
> >
> > Cheers,
> > Erik Clarke
> > The Scripps Research Institute
> > La Jolla, CA
>
> Hi Erik,
>
> That does sound promising. Switching to using numpy seems
> very sensible :)
>
> As you'll have read on the "Project Ideas" list on the wiki, I was
> thinking we should draw inspiration from Sean Davis' GEOquery
> http://www.bioconductor.org/packages/bioc/html/GEOquery.html
> in R/Bioconductor - which I had previously used from Python via
> rpy http://www.warwick.ac.uk/go/peter_cock/r/geo/
>
> Sean sometimes posts here on the Biopython lists, so it would
> be great if he could comment on your work.
>
>
I'm looking forward to taking a look.  It will be great to have a native
python implementation.  In the short term, Erik, you might take a look at
some of the tests in the GEOquery package for some (high level) edge cases
that I have stumbled onto over the years.

Sean



More information about the Biopython-dev mailing list