[Bioperl-l] bioperl-run; size/complexity of bioperl for 1.2

Catherine Letondal letondal@pasteur.fr
Sun, 24 Nov 2002 09:54:56 +0100


Ewan Birney wrote:
> 
[...]
> So... separate cvs modules - good. But... for new users... not having
> "BLAST a sequence" in the first thing they download - Bad.

I fully agree.

> What we are struggling with is that our logical description is cutting
> across our "starting functionality" set. This I am sure is something many
> projects have faced before - does anyone know how they square this circle?
> Does anyone square this circle?
> 
> 
> More practically/importantly, for bioperl-1.2, do we:
> 
> 
>   (a) distributed bioperl-1.2, bioperl-run-1.2 and say "if you want to get
> remote BLAST parsing, you have to download and install both" (I don't like
> this - new users are getting freaked out enough just by installing one of
> these beasts)
> 
>   (b) Have a bioperl-all-1.2.tar.gz, which is everything in which case:
>     - how is it structured internally?
>     - do we do this with cvs aliases or scripts
>     - does bioperl-db come in? bioperl-ext? Oh... vey...
> 
>   (c) Have bioperl-1.2 being actually "starter-pack bioperl" which is a
> merge-and-prune of bioperl-core and bioperl-run (and perhaps others) and
> then distribute bioperl-live as bioperl-core-1.2.tar.gz,
> bioperl-run-1.2.tar.gz etc.
> 

Being partly responsible of the "size" problem (*), I feel a little
uncomfortable to say that in my opinion, the b) solution is by far the best,
at least when easiness for newbies is the issue.

Having to install several parts and understand CPAN-like dependency 
mechanisms may be difficult for the newbie. Do not forget that a large part
of bioperl users do not have any sysadm in their lab and have to install 
bioperl themselves. The installation procedure is an important part of
the "user interface" of a package, and making it more complex for the
sake of simplicity looks somewhat like a paradox.

And yes, bioperl-db should be part of it, as well as bioperl-ext. What is the
benefit of not having these packages in the standard distribution? Size? The
only good reason I see for having a module external to bioperl is
modularity. IMHO, a package should be distributed and installed independantly 
from bioperl only if it  can be used independantly from bioperl (taking some
examples from the documentation: third party applications, XML modules, ...).
On the contrary, having some bioperl-* distributed independantly results
in an additional effort for people who need them to
install them, and having them never be sure they have installed the right 
version etc...

Another very important part of the user interface is the documentation! 
There is a "Getting Started" part at the top of the README that
can describe (or provide a link to) this starter pack you are speaking
about. This part would describe this "starting functionality" set
with examples.  In fact, there is already such a document (bptutorial).

--
Catherine Letondal -- Pasteur Institute Computing Center

(*) Regarding the size: in term of disk space, it's now about 20Mb, isn't it?
... which nothing for current disks
It's true that there are many modules, but among the 306 modules of 
bioperl-run, 277 are similar modules in one directory - this is not real 
complexity, is it?