[Biojava-dev] The future of BioJava

Mark Schreiber markjschreiber at gmail.com
Fri Sep 21 07:24:23 UTC 2007


Hello -

Just to clarify my opinion on Strings vs Symbols.

I generally prefer Symbols and SymbolLists to Strings cause
SymbolLists are smart and Strings are dumb. Classic case is ambiguity
symbols like 'W'. BioJava knows, in the context of DNA this is A or T.
However, I think it would be vastly simpler if there where simpler
getters and setters for SymbolLists that exposed Strings in a
friendlier manner.

I also think there is a case for SymbolLists that are backed by
Strings (more likely a char[]) instead of Symbol arrays and only do
the needed conversion when required (ie, when the user calls
SymbolAt().  These would be ideal for the case where someone is
converting GenBank to Fasta and there is no need to go through the
Symbol parsing.

Finally, I think SymbolLists (or whatever they get called) should
implement more of the methods found in String to make them look more
like Strings.  Ideally we should think about implementing some of the
methods that Groovy likes to use for operator overloading. If we do
this is would be possible to concatenate two sequences in groovy by
doing this (I may have the syntax wrong).

Seq3 = Seq1 + Seq2

The other issue with SymbolLists is that they are not intuitive to
construct because they are not so bean like. This is not just a
problem for newbies but also a major hinderance to the use of JEE,
Spring, JAXB and other important frameworks. It should be possible to
do this:

SymbolList sl = new SymbolList();
sl.setName("AB123456");
sl.setSequence(seqString);

The final hinderance to the use of JEE is serialization. If we keep
Symbols flyweight (singleton) we need to make this bullet proof from
the start. It is also practicaly impossible to make something a bean
and make it a Singleton, some careful thought is required.  If we keep
symbols behind the scenes they may not need to be so bean like.

- Mark

On 9/21/07, george waldon <gwaldon at geneinfinity.org> wrote:
> Hello,
>
> All this is very exciting. I would certainly contribute to something like that. A few remarks that come to my mind while reading all these emails.
>
> I noticed that the tutorial has seriously improved – thanks for the work. I remember my initial steps going to understanding Symbol and cross-alphabets (…)  Still, from time to time, I have difficulties with basic things that are not intuitive to me such as "token", e.g. Alphabet.getTokenizarion("token") or SymbolTokenization.tokenizeSymbolList(SymbolList).
>
> I am surprised by the all the requests to use String instead of SymbolList. The CookBook tells precisely, and with code examples, how to make most of all basic operations. Maybe someone could illustrate the new kind of code versus the old one? I bet many newbies (and older one) actually get their answer in the Cookbook.
>
> Richard wrote:
> >It is suggested that development stops on the existing Biojava(…)
> Well, I don't think the license can let you do that :-)
> Writing new code might be easier but certainly making old code better will improve the level of code abstraction. Therefore I am promoting improving existing Biojava code versus hazardous code rewrite. I can see some of the initial steps on the roadmap:
> - Switch to Subversion repository
> - Change of the build process compatible with creation of modules
> - Improving testing frame (mentioned several times)
> - Creation of white papers for coding practices, build releases, (others?)
>
> Then maybe the proper work of restructuring Biojava may start. We can either divide the existing mammoth into multiple modules at first or - my preference – building modules one by one by selectively picking classes. This way it will be easy to find out classes that can be deprecated (by lack of users) and we can even have a deprecated module at the end. Some coupling may need to loosen up. We will also need a list of API change for developers who will use the newer version.  I am sure that the kind of data structures proposed by Richard could find their place as well as some of the proposed patterns (beans, others?)
>
> Anyway, all these are simple ideas. I am not an expert in build process, but I can help with improving javadocs, writing examples and test cases. I have also a fair knowledge of the molecular biology package.
>
> Hope it helps,
> George
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>




More information about the biojava-dev mailing list