[Biojava-l] Sequence IO enhancements -- EMBL and Genbank

Thomas Down td2@sanger.ac.uk
Wed, 23 Feb 2000 16:10:14 +0000


I've just added new EMBL and Genbank file format classes to the
org.biojava.bio.seq.io package.  Both of these should plug in
seamlessly to the existing sequence IO framework.  They generate
Sequence objects with Features attached corresponding to the
full set of feature records from the files.

The main issues right now are:

  - Both only work in read-only mode at the moment (hopefully
    soon to change).

  - Much of the header information is currently lost.  It should
    be easy to record this in the Annotation object attached
    to the Sequence.  I'd be intereted to know people's views on
    how data in the Annotation object should be handled (in particular,
    it would be nice to have some `well-know' annotation tags
    for important pieces of information).

The exact mapping of feature records in files to BioJava Feature
objects is possibly not quite as smart as it could be, either
(again, I'd be interested to know people's views on `well-known'
tags for Annotation objects).  If you want to tune this behaviour,
you can just plug in a new FeatureBuilder implementation -- take
a look at SimpleFeatureBuilder for an example.

Hope this is useful,

``He returned with milk, flour, sugar... and incidentally,
a new body.''   -- Octavia Butler