[Biojava-l] Re: DNA Strider Format and SeqIOListener problem

Marc Colosimo MEColosimo@alumni.carnegiemellon.edu
Tue, 12 Mar 2002 16:42:05 -0500


Thomas Down wrote:

> On Mon, Mar 11, 2002 at 11:17:28AM -0500, Marc Colosimo wrote:
> > Hi,
> >
> > I've been working on porting a format reader for DNA Strider files (a
> > very popular Mac program). About 50 to 80 percent of it has been coded
> > (20% documented). I have limited time to work on this and each
> > time I pick it up I get stuck with implementing the SeqIOListener
> > interface. There is very little documentation on how this works. I
> > understand that I need to chain them together when calling StreamReader:
>
> Hi...
>
> To write a basic file-parser, you shouldn't have to write
> any implementations of SeqIOListener (or SequenceBuilder).
> The basic pattern for BioJava sequence parsing is just:
>
>    Raw data ---> |SequenceFormat| ---> events ---> |SeqIOListener|
>
> To support a new format, you just need to write a SequenceFormat
> implementation, which parses raw data and passes information on
> by calling methods on the SeqIOListener interfaces.
>
> If you want to instantiate normal, in-memory, objects,
> then you should be using SimpleSequenceBuilder as the listener
> at the end of this chain.

This answered one of my main questions, but not the other. After looking over
the docs I found my partial answer to my question (where is a list of defined
Features).

The Docs for Feature say:

"We may need some standardisation (sp error) for what the fields mean. In
particular, we should be compliant where sensible with GFF. "

I guess this has not been done.

So in my case I should just do the following.

public final static String PROPERTY_DESCRIPTIONLINE = "description_line"

[snip]

siol.addSequenceProperty(PROPERTY_DESCRIPTIONLINE, description);

where siol = SimpleSequenceBuilder.FACTORY

>
>
> Obviously, it's possible to write `filters' which implement
> SeqIOListener, receive one stream of events, then pass a
> (slightly modified) event stream on to another listener.
> A lot of the parsers supplied with BioJava actually consist
> of a SequenceFormat which just parses the basic `shape' of
> the file, then a filter which processes the data further.
> This was originally done for the sake of code reuse, but I
> admit it does make the system rather harder to follow.
>

Would adding defined (or standardized) features allow for a simple
SeqIOListener that implements them all.

>
> I'd suggest that you don't bother with this pattern unless it
> makes life significantly easier -- just put everything in
> the SequenceFormat object.
>
> As for the URI property...  This is to contain a URI which
> identifies the sequence.  e.g.:
>
>    file:///home/thomas/new-seq.seq
>    http://www.genome.org/exciting-clone.fa
>    urn:sequence/embl:AL121903
>
> If there's no sensible way to generate a URI for a sequence,
> I'd suggest just passing in the sequence name for this property.
>
> Hope this helps,
>
>     Thomas.

Yes is does.
Thank
Marc