[Biojava-l] Convince us to use BioJava (and come to the next bootcamp)

David Huen smh1008@cus.cam.ac.uk
Fri, 5 Apr 2002 20:33:44 +0100 (BST)


On Fri, 5 Apr 2002, Tom Hudson wrote:

> A research group here at UNCW is starting a couple of bioinformatics 
> projects in Java.  I said, "look here, there's this open-source group on 
> the web that's created a huge amount of code already, let's use it!"
> 
> The responses I've gotten have been on the order of "Eww, 600+ classes. 
> I can write my own parser faster than I can figure out what they're 
> doing." and "We don't need data models anywhere near that complex."  So 
> why should we use BioJava?  The "overview" on the web page hasn't 
> convinced anybody here.

It all depends what your group is intending to do really.  If all it wants
to do is read some EMBL files and do a few manipulations with them, then
perhaps, yes, you could write a parser faster than going thru' the trouble
of understanding BioJava.  But even in this simple case, I suspect you'll
end up writing that very same parser again and again because you want to
add some functionality that was not envisaged originally.  And debugging
it.  And fixing it when a format change occurs.  And dealing with corner
cases where some source doesn't quite follow the "standard" correctly.

If you wish to do more than that, the balance will shift rapidly.  you may
want to begin with just a sequence and some analysis on it and mark out
some features.  Fine, it's not so difficult to hack out something like
that.  Then you want to do it over a set of sequences comprising a contig:
your codes needs support for assemblies then. You write that.  You want to
visualise it.  Write some renderers.  You want persistent data objects
that can be stored to a SQL DB.  Write some more code.  You want data from
some other site in some other format and different coordinate system.
More code.  The unspeakable ******** at the other end points you to his
DAS server and says get the stuff you want from there.  Write a DAS
client.  Your transcripts have exons (voila! nested feature).  YOur gene
has multiple transcripts (hey! another nested feature).  Need
translations.  More code.  Need dynamic programming for HMM
implementation.  More code.  Need to do Blast/Fasta/HMMer output
parsing? More code.  By this stage, you might have well rewritten
BioJava.  Which gives you all this out of the box.  Today.

You need to exchange data with other labs? That don't use Java?  Fine, OBF
projects have interoperability to a fair degree between
Perl/Java/Ruby/Python/etc.

So anyway, I would say that it is difficult to predict what you need at
the start and as you develop more and more custom code, it's gets harder
and harder to to move to another system because of the legacy that is
created which in turn commits you to even more coding rather than the
research the coding is intended to support.

> 
> (Caveats: right now we're a bunch of computer people and a bunch of 
> biologists, with nobody really cross-trained; I understand that some day 
> the biologists may start asking questions that require a nested-feature 
> view of the world, but haven't convinced the other computer people to 
> plan for that day yet, and the biologists can't think of any right now.)
> 

Don't your biologists have any imagination?  They should really be
giving you guys a harder time. :-)

Might I suggest that it's not very wise asking biologists to specify
software?  (Speaking as a biologist who codes too).  Look at at what they
do and try to get to the bottom of the question they are really asking.
It's not software they want but solutions and what you'll really want to
do is offer them solutions that happen to be software. Otherwise, you'll
just end up rewriting software repeatedly as those 5%%$%$^%$%$ biologists
get you with feature creep (Sheesh, don't you guys read Dilbert? At this
rate, you might as well ask politicians to do sincerity... ;-) ).

> On a closely related topic, I want to attend a BioJava Boot Camp.  This 
> year's appears to have been announced a month and a half ahead. 
> Unfortunately, for some US federal grants, I have to schedule my travel 
> a year ahead and more.  When's the next Boot Camp?
> 
I don't think it's planned way in advance.  So far, it's been annual
(this year's is only the second time it's been held). I
suspect it could be more often and even elsewhere in the world if a)
there's the demand, b) there's a local organiser who can fix up
accommodation for attendees and instructors, do admin and organisation,
liaise with local buildings management people/ networks & IT people,
collect money, do accounting, etc. c) access to a teaching facility with
enough computers and networking to do instruction in, d) enough
instructors (all the current ones do something else for a living...).  So
far, the only place where all these conditions have been fulfilled is once
a year at the EBI.  (N.B. I'm not a authoritative source on this but I
think it is probably correct.  Personally I can't see it being held more
than twice a year at most because of (d))

OTOH, there's work behind the scenes to improve online instruction
resources so the need for a formal course may be reduced soon.

Regards,
David Huen