[Biojava-l] [newio] Proposed event-notification interfaces

Ann Loraine loraine@loraine.net
Thu, 9 Nov 2000 14:58:39 -0800 (PST)


On Thu, 9 Nov 2000, Thomas Down wrote:

> Hi...
> 
> I've been making a little more progress with my plans for
> refactoring the sequence I/O framework for BioJava 1.1.  I've
> attached two interfaces:
> 
>   SeqIOListener     Generic listener for events produced by
>                     parsing biological sequence data
> 
>   SequenceBuilder   SeqIOListener which builds a new BioJava
>                     sequence object.
> 
> Rebuilding the I/O framework around these interfaces would
> meet the following objectives:
> 
>   - Decoupling all parts of the Sequence construction process
>     from the file parsing.

Yes!  I like this concept!

> 
>   - An easy way to plug in filter and transducer objects between
>     the parser and the Sequence construction step.

Yes again!

> 
>   - Potential to handle `feature-only' formats like GFF and GAME.

You could build a double-parser that extracts coordinates from
a GFF/GAME file and then grabs the corresponding sequence out of
a fasta db.

> 
> Issues which are still open:
> 
>   - Exactly how should multiple sequence alignments be handled
>     within the framework?  One suggestion made internally at
>     sanger would be to use a separate SequenceBuilder for each
>     component of the alignments.  I'd welcome comments from anyone
>     who uses BioJava Alignments on this topic.  Are there any
>     commonly used formats for `annotated' alignments, with
>     data which should be built into BioJava feature objects?

Please allow in-between residues annotations as well as
on-top-of residues annotations.

For instance, in-between annotations are useful for mapping splice
sites onto alignments.  On-top-of anotations are useful for flagging
individual residues.

> 
>   - Are there any extra methods on SeqIOListener which I've
>     missed?  For instance, it's tempting to have a specific
>     method for notifying the listener about a sequence's
>     database ID, if this is present in the file.  Any thoughts?
> 

I would focus on designing the event class so that it can adequately
capture the information being parsed, and then write your listeners
based on the events.

Also seems like you would want to have a general enough type of event
that could handle structured information (name-value pairs, named
lists, etc) in which you don't know anything about the semantics of
what's coming.  

In cases where you do, you could have your parser broadcast more
specialized events - subclasses of your very general base class event.

The hard part in my mind is: where is the best place to put semantics?
For instance, what objects need to know about database id, locus name,
etc, and what objects just need to know about name-value/name-list pairs?

I hope this is useful!

-Ann