[Biojava-l] implementing writeSequence for EMBL/Genbank

Keith James kdj@sanger.ac.uk
21 Feb 2001 16:45:27 +0000


I'm about to start on writeSequence for EMBL format (and other
EMBL-like formats). All these will probably have a bunch of utility
methods in common e.g. writing/wrapping lines with variable leader
strings, formatting sequence blocks and other tiresome stuff. Perhaps
a utility class with static methods for these?

At the moment the EmblLikeFormat class has a writeSequence method
which explicitly throws a RuntimeException, as does
GenbankFormat. Only FastaFormat is not "faking it".

As EmblLikeFormat is generic how is writeSequence to know whether (and
how) to write EMBL or SwissProt etc? The writeSequence method is not
parameterized to accept a formatting object. If this were done at the
constructor it would impinge on sequence reading, which does just fine
without help from other classes. Anyway, a formatting object for a
format object doesn't seem very elegant.

Matt has suggested splitting the SequenceFormat interface into two;
one for reading, one for writing. I don't know enough about design to
judge the merits of this in the long term. As a naive user of BioJava
I quite like the one-stop SequenceFormat which knows all about
reading/writing its own data; "it does exactly what it says on the
tin". But maybe I'm wrong...

I'd like to get some opinions before I start hacking.

cheers,

-- 

-= Keith James - kdj@sanger.ac.uk - http://www.sanger.ac.uk/Users/kdj =-
The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambs CB10 1SA