[Biojava-l] Refactoring the org.biojava.bio.seq package

Matthew Pocock mrp@sanger.ac.uk
Tue, 11 Apr 2000 14:00:51 +0100


Dear all,

I have just posted the new BioJava api to the BioJava web site. I will
also
deposit biojava.jar and biojava-live.tar.gz for those of you without CVS
access.
The docs are now built to show only the public and protected API. This
makes it
a lot smaller & easier to read.

The org.biojava.bio.seq package has got schitophrenic & has too many
interfaces
(24?) to browse easily. Also, org.biojava.bio.seq.tools is becomming a
dumping-
ground for miscelaneous usefull classes. The original intent was that
the seq
package would be the home for biologicaly (or bioinformatics) motivated
classes
and concepts. However, it currently also contains all of the
Residue/Alphabet
stuff as well.

Another point that has been raised several times to me in person is that
the
name 'Residue' is confusing to people - it sounds too much like it only
pertains
to proteins. After discussions between Ewan Birney, Kim Rutherford,
Michele
Clamp, Thomas Down and myself, we decided that it would be good to
rename all
the Residue* classes to Symbol*, and to change Residue.getSymbol() to
Symbol.getToken() (to stop you saying symbol.getSymbol() which looks
silly).
Also, the Alphabet.getParser("symbol") would change to
Alphabet.getParser("token").

I propose the following refactoring:

Rename all the Residue* classes to Symbol*

Create a package org.biojava.bio.symbol which will contain the symbolic
algebra
interfaces:
  Residue
  Alphabet
  FiniteAlphabet
  ResidueList
  CrossProductAlphabet
  CrossProductResidue
exceptions:
  IllegalResidueException
  IllegalAlphabetException

It will also contain all the direct implementations of these interfaces,

including those that currently reside in seq.tools. AlphabetManager
would also
move into this package. SuffixTree would move here also (this class
bounces
arround a lot).

DNATools & ProteinTools would move into bio.seq, as they capture real
biological
information. ComplementResidueList (aka ComplementSymbolList) would
become
package private in bio.seq and we would add a method
DNATools.complement(SymbolList dna) to complement a DNA sequence using
this
class.

Lastly, Annotation, Annotatable and possibly Location can be moved to
org.biojava.bio, to reflect their ubiquitous nature accross the other
interfaces.

Please mail me and/or the list if you have any views about how things
should be
moved arround, or if you think that things are missing, or if there is
anything
that you would like to see removed all-together.

All the best,

Matthew