[Biojava-l] GFF & feature creation

Matthew Pocock mrp@sanger.ac.uk
Mon, 21 Feb 2000 18:21:15 +0000


Dear all,

We should be sorting the web site this week. With any luck, we will be
able to provide some sort of webified cvs access.

I have just added org.biojava.bio.program.gff to deal with GFF-related
stuff. There is a parser, a gff record, a set of records & some other
crap you don't need to look at. You can now load in a GFF file, and
create features within biojava sequence objects. I will add a demo
tomorrow.

To deal with the strandedness of GFF records, I have added
StrandedFeature as a sub-interface
of Feature. This forced me to have another attempt at the feature type
vs sequence type vs sequence & feature interface tangle. If you don't
care, then don't read on. It should prety much never impact you at all
if you write scripts.

The issues are:

Any particular feature interface should interoperate between different
Sequence implementations - so oracle should be able to serve up
StrandedFeature objects as should AceDB. However, they will want to use
a different implementation. This argues for some sort of factory method
in sequence that creates the correct feature types.

  but:

Different feature interfaces will provide different data e.g. strand,
residue frequencies, splicing info, beta-sheet direction, so this argues
for one factory per feature interface.

  so:

It looks like we need one factory per sequence impl per feature
interface & some way of getting the correct factory used in the
apropreate place.

  solution:

Use something like the Momento pattern to specify the constructor
arguments (location, source etc.). This is a class defined in the
particular feature interface with the name 'Template', so there is
Feature.Template and StrandedFeature.Template (which adds a strand
field). Then, sequence provides a factory method that takes the
template, and returns a feature that implements the relevant interface.
It is up to the Sequence implementation to handle issues of which class
to instantiate for which template (but it is done for you in
SimpleSequence).

To see how it works in practice, look at GFFEntrySet .getAnnotator for
an example of making features, and peak inside SimpleSequence,
SimpleFeature and SimpleStrandedFeature to see how the guts wire.

This sceim is extensible - you can add your own feature interfaces &
implementations. It is also nicely type-safe for other sequence
implementations. It will do for now.

I will update all the docs soon.

Matthew