[Bioperl-l] FWD: Mail System Error - Returned Mail

Aaron J Mackey ajm6q@virginia.edu
Fri, 23 Feb 2001 16:18:48 -0500 (EST)


On Fri, 23 Feb 2001, Frank Gibbons wrote:

> Ewan Birney wrote:
> >> So, my questions are:
> >> * Is this appropriate for BioPerl in the first place? Would it be more
> >> suitable for CPAN? The algorithms are general, but my focus is on
> >> BioInformatics.
> >Possible more suitable for a straight CPAN. Not sure. Perhaps CPAN modules
> >with then a Bioperl stubs/docs to works things from there. Whatever you
> >feel comfortable with.

I would vote for CPAN, especially considering your statement that they are
general clustering methods.

> Hilmar, it won't be as fast as C, but then clustering is largely an
> exploratory technique, rather than something you run routinely. You do it once
> or twice to see what it gives you, then you go away and perform other kinds of
> analyses/experiments based on your clustering results. Still, I may end up
> having to write some of it in C, if it's unbearably slow. So, better to keep
> it out of BioPerl.

The other issue you'll run into is memory.  For single-linkage clustering,
I've found myself rewriting Matrix::Math::Bool to use upper/lower triangle
storage only, and I still can't cluster the human genome within a smallish
computer's memory.  I think you'll find that with modest sized
problem, Perl won't take just a little longer, but alot (a postdoc in our
group has been clustering on about 25,000 members, and her scripts run for
a day on a fast machine - granted it's not single linkage, theres a bit of
computation going on, but that's still pretty slow, if she had written it
in C, it would probably complete in an hour or less).

As always, your mileage may vary.

There are substantial amounts of packaged fortran and S/R code for
general (statistical/mathematical) clustering.  Perl wrappers around these
beasties would be an excellent way to get into CPAN ...

-Aaron

-- 
 o ~   ~   ~   ~   ~   ~  o
/ Aaron J Mackey           \
\  Dr. Pearson Laboratory  /
 \ University of Virginia  \
 /  (804) 924-2821          \
 \  amackey@virginia.edu    /
  o ~   ~   ~   ~   ~   ~  o