[Bioperl-l] Public Release of the Xobjects gene expression package
Nathan O. Siemers
siemersn@bms.com
Sun, 29 Jul 2001 17:40:55 -0400
Folks,
As discussed with some people at the BOSC, we are releasing
our core package for manipulating gene expression information,
dubbed "Xobjects". The current tarball for this package can
be downloaded from my personal low bandwidth server:
http://www.fiveprime.com
I can imagine uploading this to CPAN, but I thought I would
let you chew on it a bit first and let us know what you think.
Xobjects is a fairly simple system that relies on hashes of
gene expression information, produced by "loaders" for the
particular technology (see AffyCHP.pm and ArrayVS.pm in the
distribution for example loading modules). Once these data
are loaded, almost any combination of aggregation,
normalization/scaling, and ratio-taking may be performed. The
objects themselves (hashes of data with associated methods)
may be built up into a tree of relationships that can
accurately and flexibly reflect the particular biological
design of the experiment in question. There is a simple
example in the Xobject.pm POD.
Once created and transformed, various output methods allow you
to deliver the processed data into excel, genecluster, and
various other formats for further analysis, plus some
primitive web outputs.
Xobjects is released under the terms of the LGPL. If there
are any serious issues with the LGPL and BioPerl, let me know.
Have fun, and please bear with us - I wish I could say that
this was a "perfect" release, but there are of course loose
ends. Here is the list of Caveats I can think of:
This is *not* integrated into the BioPerl Object Models.
Some of the POD about data structures may be slightly stale,
but the truth can be gleaned from the code easily.
To our chagrin, Chart::GNUPlot no longer seems to exist at
CPAN. Two methods we wrote for creating scattercharts and
histograms of data depend on this. We have for now commented
out the respective areas so that the software will build on
systems without Chart::GnuPlot. A replacement for these
methods should be put in place, and they should be separate
from the Stats.pm statistical module where they currently
reside.
Stats.pm works, but is ugly (my first perl module years ago
now), and needs to be replaced by the newer stat module that
has finally appeared on cpan. When Xobjects was first
written, the public Statistics::Descriptive was not sufficient
to get our work done.
We had to rip out some database-specific code that lets us
load xobjects from our home-brewed gatc relational database.
We would be happy to share the methods, but because of
customizations they may not be functional out of the box.
We also removed our calls to our internal databases in
TieGene.pm, the module that retrieves descriptive information
about probes on the arrays. All you need to do is supply the
appropriate hashes (from databases, flat files, etc) for your
data and you should be up and running.
Let us know of any problems!
Thanks,
nathan
Nathan Siemers
Xin Huang
Donald Jackson
--
Nathan Siemers, Group Leader, Bioinformatics-Applied Genomics
Bristol-Myers Squibb Pharmaceutical Research Institute, Hopewell 3-0.07
P.O. Box 5400, Princeton, NJ 08543-5400, (609)818-6568
nathan.siemers@bms.com