[Biojava-l] DNA Strider Format and SeqIOListener problem

Thomas Down td2@sanger.ac.uk
Tue, 12 Mar 2002 10:12:22 +0000


On Mon, Mar 11, 2002 at 11:17:28AM -0500, Marc Colosimo wrote:
> Hi,
> 
> I've been working on porting a format reader for DNA Strider files (a
> very popular Mac program). About 50 to 80 percent of it has been coded
> (20% documented). I have limited time to work on this and each
> time I pick it up I get stuck with implementing the SeqIOListener
> interface. There is very little documentation on how this works. I
> understand that I need to chain them together when calling StreamReader:

Hi...

To write a basic file-parser, you shouldn't have to write
any implementations of SeqIOListener (or SequenceBuilder).
The basic pattern for BioJava sequence parsing is just:

   Raw data ---> |SequenceFormat| ---> events ---> |SeqIOListener|

To support a new format, you just need to write a SequenceFormat
implementation, which parses raw data and passes information on
by calling methods on the SeqIOListener interfaces.

If you want to instantiate normal, in-memory, objects,
then you should be using SimpleSequenceBuilder as the listener
at the end of this chain.

Obviously, it's possible to write `filters' which implement
SeqIOListener, receive one stream of events, then pass a
(slightly modified) event stream on to another listener.
A lot of the parsers supplied with BioJava actually consist
of a SequenceFormat which just parses the basic `shape' of
the file, then a filter which processes the data further.
This was originally done for the sake of code reuse, but I
admit it does make the system rather harder to follow.

I'd suggest that you don't bother with this pattern unless it
makes life significantly easier -- just put everything in
the SequenceFormat object.

As for the URI property...  This is to contain a URI which
identifies the sequence.  e.g.:

   file:///home/thomas/new-seq.seq
   http://www.genome.org/exciting-clone.fa
   urn:sequence/embl:AL121903

If there's no sensible way to generate a URI for a sequence,
I'd suggest just passing in the sequence name for this property.


Hope this helps,

    Thomas.