[Biojava-dev] The future of BioJava

Fri Sep 21 07:54:51 UTC 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi George.

By 'stop development' I really meant just that active development
efforts would be focused on the new codebase rather than modifying the
existing one (except of course for fixing bugs, which is always
important and we wouldn't stop doing that until the new codebase was
well established as an alternative).

I agree that modifying the existing codebase would improve many of the
problems currently experienced with it - code abstraction being just one
of them. BioJavaX was an attempt at doing this. The big stumbling block
was interfaces - users do not expect interfaces to change as it breaks
all code that already uses that interface. They also do not expect the
defined behaviour of methods in interfaces to change - which meant, for
instance, that I had real problems trying to get
RichFeature/RichLocation and RichLocation/Location to match up as some
parts of Feature and Location conflicted with the more realistic
requirements of their Rich* equivalents (e.g. circularity).

If you change interfaces, you might as well start from scratch in terms
of the effect it has on end-user's code. Also, if we start from scratch,
it allows us to build up from the very basics the kind of robustness and
flexibility we need throughout the system. As mentioned in the original
posting the existing system is heavily sequence-focused, meaning that
even the simple task of scanning a set of features cannot be done
without also loading the associated sequences because the two are so
closely integrated. We need to make it much more flexible and I think
new code would give us a better opportunity to do so without being tied
into complying with existing interfaces or behaviour expectations.

Having said that, I do expect large parts of the new codebase to be only
slightly modified copies of the original code, particularly regarding
recent developments such as genetic algorithms and phylogenetics. It
would be silly to write such logic all over again where the code is
relatively self-contained.

cheers,
Richard

george waldon wrote:
> Hello,
> 
> All this is very exciting. I would certainly contribute to something like that. A few remarks that come to my mind while reading all these emails.
> 
> I noticed that the tutorial has seriously improved – thanks for the work. I remember my initial steps going to understanding Symbol and cross-alphabets (…)  Still, from time to time, I have difficulties with basic things that are not intuitive to me such as “token”, e.g. Alphabet.getTokenizarion(“token”) or SymbolTokenization.tokenizeSymbolList(SymbolList). 
> 
> I am surprised by the all the requests to use String instead of SymbolList. The CookBook tells precisely, and with code examples, how to make most of all basic operations. Maybe someone could illustrate the new kind of code versus the old one? I bet many newbies (and older one) actually get their answer in the Cookbook.
> 
> Richard wrote:
>> It is suggested that development stops on the existing Biojava(…)
> Well, I don’t think the license can let you do that :-)  
> Writing new code might be easier but certainly making old code better will improve the level of code abstraction. Therefore I am promoting improving existing Biojava code versus hazardous code rewrite. I can see some of the initial steps on the roadmap:
> - Switch to Subversion repository
> - Change of the build process compatible with creation of modules
> - Improving testing frame (mentioned several times)
> - Creation of white papers for coding practices, build releases, (others?)
> 
> Then maybe the proper work of restructuring Biojava may start. We can either divide the existing mammoth into multiple modules at first or - my preference – building modules one by one by selectively picking classes. This way it will be easy to find out classes that can be deprecated (by lack of users) and we can even have a deprecated module at the end. Some coupling may need to loosen up. We will also need a list of API change for developers who will use the newer version.  I am sure that the kind of data structures proposed by Richard could find their place as well as some of the proposed patterns (beans, others?)
> 
> Anyway, all these are simple ideas. I am not an expert in build process, but I can help with improving javadocs, writing examples and test cases. I have also a fair knowledge of the molecular biology package.
> 
> Hope it helps,
> George
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG83jK4C5LeMEKA/QRAtOFAJsF9YNdgdsOm1KY65GyRehsO1ElYwCfeUfi
yXWTMXSzn3mXZqXXo9999rw=
=WbAQ
-----END PGP SIGNATURE-----