[Biopython-dev] GEO library revamp
sdavis2 at mail.nih.gov
Wed May 16 17:20:11 UTC 2012
On Wed, May 16, 2012 at 1:07 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:
> On Tue, May 15, 2012 at 5:44 PM, Erik Clarke <erikclarke at gmail.com> wrote:
> > Hi all,
> > I saw on the wiki that the BioPython GEO library was in need of some TLC.
> > I agree; a recent effort to use the parser for a project in our lab was
> > stymied by its lack of flexibility (it seems to be particularly poor at
> > reading GEO datasets, for instance).
> > In response, we've developed a basic GEO module in Python loosely based
> > GEOQuery and the existing Geo module. Currently, our module is capable of
> > downloading and parsing all four major GEO record types and providing
> > rudimentary pretty-print output of the data. It also provides a
> > representation of a GDS file in a form amenable to statistical analysis
> > using SciPy. I've included a method that finds the enriched genes in a
> > subset as a demonstration.
> > Since it was an internal project before this, I would appreciate any
> > feedback in terms of usability, bugs, etc that we may not have caught.
> > still under active development as I flesh out some of the missing
> > (better pretty-printing, bug fixes, complete unit-test coverage, etc).
> > In any case, my development branch of BioPython is here:
> > https://github.com/eclarke/biopython/tree/GEOQuery, and obviously all
> of the
> > new code is in the Bio/Geo folder (Records.py will replace Record.py).
> > tried to make it as well-commented as possible. I have not yet tested it
> > Python < 2.7, but I plan on doing so.
> > If this is of interest to anybody, I would be more than happy to tweak it
> > as people saw fit and hopefully one day replace the current GEO parser.
> > Cheers,
> > Erik Clarke
> > The Scripps Research Institute
> > La Jolla, CA
> Hi Erik,
> That does sound promising. Switching to using numpy seems
> very sensible :)
> As you'll have read on the "Project Ideas" list on the wiki, I was
> thinking we should draw inspiration from Sean Davis' GEOquery
> in R/Bioconductor - which I had previously used from Python via
> rpy http://www.warwick.ac.uk/go/peter_cock/r/geo/
> Sean sometimes posts here on the Biopython lists, so it would
> be great if he could comment on your work.
I'm looking forward to taking a look. It will be great to have a native
python implementation. In the short term, Erik, you might take a look at
some of the tests in the GEOquery package for some (high level) edge cases
that I have stumbled onto over the years.
More information about the Biopython-dev