[Biojava-dev] BioJava 3 Begins - Volunteers please!

Thu Oct 23 05:12:07 UTC 2008

Sorry, I'm a bit late to the game.  Hope I didn't miss anything
exciting yet!

Would it be better to commit this to trunk, and put the current codebase
out to pasture on a branch?

Is it possible (or desireable) to send SVN commit messages to the dev
mailing list?  Or alternatively, should someone create a project entry for
biojava on CIA.vc?

http://cia.vc

As soon as I can remember my dev.open-bio.org password I'll start
committing stuff, otherwise I'll post patches to bugzilla.

   michael

On Mon, 20 Oct 2008, Richard Holland wrote:

> Hi all,
>
> I've just committed some new code to the biojava3 branch of the biojava-live
> subversion repository. It's the foundations of a brand new alphabet+symbol
> set of classes, and an example of how to use them to represent DNA. You'll
> notice that the new code is very lightweight and allows for a lot more
> flexibility than the old code - for instance, the concept of Alphabet has
> changed radically. It also makes much more extensive use of the Collections
> API.
>
> I haven't got any test cases or usage examples yet but give me a shout if
> you don't understand the code and I'll explain how it works. (Hint:
> SymbolFormat is there to convert Strings into SymbolList objects, and vice
> versa).
>
> So, now we want some volunteers! We're starting from scratch here so there's
> a lot of work to do. The whole of BioJava needs 'translating' into BJ3,
> whether it be copy-and-paste existing classes and modify them to suit the
> new style, or write completely new ones to provide equivalent functionality.
>
>
> I'll post an example of how to do file parsing soon, probably starting with
> FASTA. In the meantime, a good place to start would be for people to design
> object models to represent their favourite data types (e.g. Genbank, or
> microarray data). Utility classes to manipulate those objects would be great
> too.
>
> The object models need to be normalised as much as possible - e.g. if your
> data has a lot of comments, and the order of those comments is important,
> then give your object model a collection of comment objects. The object
> model for each data type should be completely independent and use basic data
> types wherever possible (e.g. store sequences as strings, don't attempt to
> parse them into anything fancy like SymbolLists). The closer the object
> model is to the original data format, the better. There's going to be clever
> tricks when it comes to converting data between different object models
> (e.g. Genbank to INSDSeq), which I will explain later when I put the file
> parsing examples up.
>
> You'll notice how the biojava3 branch uses Maven instead of Ant. This is
> because we want to make it as modular as possible, so if you want to write
> microarray stuff, create a new microarray sub-project (as per the dna
> example that's already there). This way if someone only wants the microarray
> bit of BJ3, they only need install the appropriate JAR file and can ignore
> the rest. (The 'core' module is for stuff that is so generic it could be
> used anywhere, or is used in every single other module.)
>
> If coding isn't your cup of tea, then we would very much welcome testers
> (particularly those who enjoy writing test cases!), documenters
> (particularly code commenters), translators (for internationalisation of the
> code), and of course all those who wish to contribute ideas and suggestions
> no matter how off-the-wall they might be. In particular if you'd like to
> take charge of an area of the development process, e.g. Documentation Chief,
> or Protein Champion, then that would be much appreciated.
>
> I'm very much looking forward to working with everyone on this. Good luck,
> and happy coding!
>
> cheers,
> Richard
>
> PS. Please don't forget to attach the appropriate licence to your code. You
> can copy-and-paste it from the existing classes I just committed this
> evening.
>
> PPS. For those who are worried about backwards compatibility - this was
> discussed on the lists a while back and it was made clear that BJ3 is a
> clean break. However, the existing code will continue to be maintained and
> bugfixed for a couple of years so you don't have to upgrade if you don't
> want to - it just won't have any new features developed for it. This is
> largely because it'll probably take just that long to write all the new BJ3
> code. When we do decide to desupport the existing BJ code, plenty of notice
> will be given (i.e. years as opposed to months).
>
>
> --
> Richard Holland, BSc MBCS
> Finance Director, Eagle Genomics Ltd
> M: +44 7500 438846 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>