[Bioperl-l] question about the nature of bioperl

Tue, 20 Aug 2002 19:11:39 -0700 (PDT)

On Tue, 20 Aug 2002, nkuipers wrote:

> >horses for courses.
>
> I am showing my age here.  What does that mean? =)

i guess it means different horses race better on different tracks.

> >is your code available
>
> Regarding the code written by my supervisor, it's not up to me, since that
> integrates rather extensively with our databases and it's not mine to give
> anyway.  As for my own...well, since my work at this time is simple enough, I
> use a series of scripts that I've cobbled together as the need arose.  As
> you've likely guessed, I am somewhat new to Perl, otherwise it would not have
> developed that way.  Anyways, I've been realizing lately the need to unify
> them at some point into a module.  For my current needs, it would do such
> basic things as accept FASTA and strip everything but sequence characters for
> hash operations on redundancy, write back to FASTA, do exact subsequence
> matching with counts versus a flatfile, and provide a simple BLAST parser that
> grabs sequences by user-defined hit-def keywords ('parapoxvirus', 'rhetinoid',
> '7SL RNA' or whatever).  That comes later however; at the moment they work and
> we are pressed for time/results. =)

perl is great for this sort of stuff; you may run into problems with
scalability however - if this system continues to grow then one day you
could end up with a huge bloated flaky system that is incredibly sensitive
to intitial conditions; eg using underscores in a fasta header at the
beginning of the pipeline has some weird effect way upstream due to some
unwritten assumption some regexp makes.

i think a lot of us in bioperl have worked in places that have some hugely
important piece of legacy code that was originally a quick hack written by
someone long gone, it's a nightmare. that's why so many of us are
attracted to promises of robust object oriented solutions, with strict
interfaces and all the rest, despite the fact that these sometimes get a
little bloated, and pose a steep learning curve to new users. if you use
the core modules in bioperl, you're guaranteed robust code (or at least as
much robustness as you can expect in bioinformatics)

i think there's room for both; quick hacks aren't always bad, so long as
they are regularly purged and don't become legacy. sometimes a single cpu
makefile based pipeline is prefable to a mysql one that has all the bells
and whistles for interacting with a compute farm. often a bunch of
procedural perl scripts strung together with |s is the way to go (so long
as it's clear how to rewrite each and every component).

maybe bioperl is biased towards the use-cases of the core developers, who
often work in big genome centres?

> >a general repository for an ad hoc collection of unsupported scripts and
> >modules that don't fit into the central framework?
>
> I love this idea, though I would be in favor of reviews for redundancy (10
> scripts doing exactly the same thing from different people would not be
> beneficial, although maybe 2-3 scripts doing the same thing in procedural and
> OO ways would be useful) and basic quality of the Perl itself.

if you're volunteering for the reviewing... see, a big part of this is
maintenance - it's easy to get a prototype up and running, but making sure
it's working in a year is a different story. i still think this is a good
idea. in fact all it would take would be a wiki page where anyone can
enter urls pointing to their own scripts/modules - would anyone add
anything? it's worth a shot.

> cheers,
> nathanael kuipers
>
>