[Biojava-l] Handling assemblies

Thomas Down td2@sanger.ac.uk
Wed, 9 Aug 2000 12:54:55 +0100


Hi...

One of the features I'd like to see in good time for the
1.1 release is a framework for handling sequence assemblies
(contigs).  I need this to support my implementation of the
DAS protocol (http://stein.cshl.org/das/), but I can see it
being helpful for many BioJava applications.

I think the following objectives are important when doing
this:

  - Lightweight solution: this shouldn't be a cause for
    serious interface bloat.

  - Take advantage of existing BioJava strengths (e.g.
    hierarchical Feature representation).

  - No restrictions on the internal data model used to store
    contig data.

I suggest we add a single interface, org.biojava.bio.seq.ComponentFeature.
This is a sub-interface of StrandedFeature, and represents one of
the component sequence fragments which make up an assembly.
This interface should look something like:

  public interface ComponentFeature extends StrandedFeature {
      public Sequence getComponentSequence();
      public Location getComponentLocation();
  }

This indicates that a part of componentSequence has been
`attached' to the sequence which contains the ComponentFeature.
An assembly is simply defined as any Sequence object which
contains one of more ComponentFeatures -- I don't believe
that any special interface is needed for the assemblies themselves.

If the componentSequence has any Features of its own, these
should be `projected' onto the assembled Sequence object.  The
exact semantics of this projection are probably best left to
Sequence implementors, although I'm working on some guidelines
at the moment.

What I've got so far:

  - The ComponentFeature interface.

  - Various support classes which help with the `feature
    projection' part of implementing an assembly.

  - SimpleAssembly, a quick in-memory implementation of
    Sequence which assembles sequence data from ComponentFeatures.

If there aren't any immediate objections to my assembly plan,
I'll check these in and let people have a play with them.
Initial indications from my experiments with SimpleAssembly
suggest that the framework works quite well, but I'd like
to hear plenty of feedback on this one.

Happy hacking (and sorry for the long message),
   Thomas.
-- 
One of the advantages of being disorderly is that one is
constantly making exciting discoveries.
                                       -- A. A. Milne