[Biojava-dev] The future of BioJava

Richard Holland holland at ebi.ac.uk
Sun Sep 23 11:16:14 UTC 2007


Understood.

I was thinking of including a 'compatibility mode' module in BJ3 which
provides all the existing BJ2 interfaces and maps them to the new ones.
This way we have the best of both worlds - existing projects would replace
the BJ2 jars with the new BJ3 jars on the classpath plus the compatibility
jar and wouldn't need to change any code at all. Existing import
statements would then pick up the compatibility mappings instead of the
original classes.

Anyhow your comments will definitely be considered. This is very early
discussion after all - the point being to gather opinion and ideas to see
if its even worth making a change, let alone what kind of change that
would be.

cheers,
Richard

PS. The most fundamental problem is that some of the existing interfaces
are broken. They enforce situations which are not biologically logical -
e.g. the feature and location interfaces have got strand mixed up. You
can't fix this without altering the interfaces - and to alter the
interfaces requires people to change existing code. If they're going to
change existing code, why not make a clean sweep of it. Even deprecating
for one release then removing in a subsequent one will still require you
to change the 1500+ classes you mention, which is only delaying the
problem.

PPS. I will compile a comprehensive list of things I think are
broken/wrong so that people can discuss specifically what should be done
about them - whether they be rewrite or modification. I do want this to be
a democratic process and if the majority of people don't want a particular
plan of action to happen, then it won't.



On Sat, September 22, 2007 5:35 pm, george waldon wrote:
> Richard,
>
> You cannot kill biojava and it is not vista; you cannot force people to
> use it. I have a project with hundreds of classes using biojava and
> working without a glitch and the choice of either keeping with it or
> switching to a bj3 in the middle of a rewrite of around 1500 classes that
> may take months or years to complete. I may just never switch to the new
> biojava. Most likely, a lot of people are going to be in a similar
> situation and most likely bj3 will also have to have support old biojava
> classes - great!
>
> I agree that you cannot change interface but you can deprecate them and
> toss them after one release cycle or put them into a deprecated module
> that is not included in releases.
>
> The question becomes: what are the fundamental problems of biojava that
> truly justify a rewrite from the ground? Certainly, need for a new symbol
> model could be one; maintenance and testing are not; modular structure is
> not; and use of generics is not - they do not break old code.
>
> George
>
>
>> -----Original Message-----
>> From: Richard Holland
>> To: george waldon
>> Cc: biojava-dev at biojava.org
>> Sent: 9/21/2007 12:54 AM
>> Subject: Re: [Biojava-dev] The future of BioJava
>>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Hi George.
>>
>> By 'stop development' I really meant just that active development
>> efforts would be focused on the new codebase rather than modifying the
>> existing one (except of course for fixing bugs, which is always
>> important and we wouldn't stop doing that until the new codebase was
>> well established as an alternative).
>>
>> I agree that modifying the existing codebase would improve many of the
>> problems currently experienced with it - code abstraction being just one
>> of them. BioJavaX was an attempt at doing this. The big stumbling block
>> was interfaces - users do not expect interfaces to change as it breaks
>> all code that already uses that interface. They also do not expect the
>> defined behaviour of methods in interfaces to change - which meant, for
>> instance, that I had real problems trying to get
>> RichFeature/RichLocation and RichLocation/Location to match up as some
>> parts of Feature and Location conflicted with the more realistic
>> requirements of their Rich* equivalents (e.g. circularity).
>>
>> If you change interfaces, you might as well start from scratch in terms
>> of the effect it has on end-user's code. Also, if we start from scratch,
>> it allows us to build up from the very basics the kind of robustness and
>> flexibility we need throughout the system. As mentioned in the original
>> posting the existing system is heavily sequence-focused, meaning that
>> even the simple task of scanning a set of features cannot be done
>> without also loading the associated sequences because the two are so
>> closely integrated. We need to make it much more flexible and I think
>> new code would give us a better opportunity to do so without being tied
>> into complying with existing interfaces or behaviour expectations.
>>
>> Having said that, I do expect large parts of the new codebase to be only
>> slightly modified copies of the original code, particularly regarding
>> recent developments such as genetic algorithms and phylogenetics. It
>> would be silly to write such logic all over again where the code is
>> relatively self-contained.
>>
>> cheers,
>> Richard
>>
>>
>>
>> george waldon wrote:
>> > Hello,
>> >
>> > All this is very exciting. I would certainly contribute to something
>> like that. A few remarks that come to my mind while reading all these
>> emails.
>> >
>> > I noticed that the tutorial has seriously improved – thanks for the
>> work. I remember my initial steps going to understanding Symbol and
>> cross-alphabets (…)  Still, from time to time, I have difficulties
>> with
>> basic things that are not intuitive to me such as “token”, e.g.
>> Alphabet.getTokenizarion(“token”) or
>> SymbolTokenization.tokenizeSymbolList(SymbolList).
>> >
>> > I am surprised by the all the requests to use String instead of
>> SymbolList. The CookBook tells precisely, and with code examples, how to
>> make most of all basic operations. Maybe someone could illustrate the
>> new kind of code versus the old one? I bet many newbies (and older one)
>> actually get their answer in the Cookbook.
>> >
>> > Richard wrote:
>> >> It is suggested that development stops on the existing Biojava(…)
>> > Well, I don’t think the license can let you do that :-)
>> > Writing new code might be easier but certainly making old code better
>> will improve the level of code abstraction. Therefore I am promoting
>> improving existing Biojava code versus hazardous code rewrite. I can see
>> some of the initial steps on the roadmap:
>> > - Switch to Subversion repository
>> > - Change of the build process compatible with creation of modules
>> > - Improving testing frame (mentioned several times)
>> > - Creation of white papers for coding practices, build releases,
>> (others?)
>> >
>> > Then maybe the proper work of restructuring Biojava may start. We can
>> either divide the existing mammoth into multiple modules at first or -
>> my preference – building modules one by one by selectively picking
>> classes. This way it will be easy to find out classes that can be
>> deprecated (by lack of users) and we can even have a deprecated module
>> at the end. Some coupling may need to loosen up. We will also need a
>> list of API change for developers who will use the newer version.  I am
>> sure that the kind of data structures proposed by Richard could find
>> their place as well as some of the proposed patterns (beans, others?)
>> >
>> > Anyway, all these are simple ideas. I am not an expert in build
>> process, but I can help with improving javadocs, writing examples and
>> test cases. I have also a fair knowledge of the molecular biology
>> package.
>> >
>> > Hope it helps,
>> > George
>> >
>> > _______________________________________________
>> > biojava-dev mailing list
>> > biojava-dev at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/biojava-dev
>> >
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.2.2 (GNU/Linux)
>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>>
>> iD8DBQFG83jK4C5LeMEKA/QRAtOFAJsF9YNdgdsOm1KY65GyRehsO1ElYwCfeUfi
>> yXWTMXSzn3mXZqXXo9999rw=
>> =WbAQ
>> -----END PGP SIGNATURE-----
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


-- 
Richard Holland
BioMart (http://www.biomart.org/)
EMBL-EBI
Hinxton, Cambridgeshire CB10 1SD, UK




More information about the biojava-dev mailing list