[Biojava-dev] Proposed change to RichFormat interface

Richard Holland richard.holland at ebi.ac.uk
Wed Jun 7 12:36:49 UTC 2006


Hi guys.

See org.biojavax.seq.io.DebuggingRichSeqIOListener.

It extends BufferedInputStream, so can be used to wrap a normal
InputStream before being passed around.

It also implements RichSeqIOListener.

The idea is that you do something like this:

	Namespace ns = RichObjectFactory.getDefaultNamespace();
	InputStream is = new FileInputStream("myFastaFile.fasta");
	FASTAFormat format = new FASTAFormat();

	DebuggingRichSeqIOListener debug = 
		new DebuggingRichSeqIOListener(is);
	BufferedReader br = new BufferedReader(
		new InputStreamReader(debug));

	SymbolTokenization symParser = format.guessSymbolTokenization(debug);

	format.readRichSequence(
            br,
            symParser,
            debug,
            ns);

This will then dump out everything as it is read, and all events as they
happen in-line with the input as it is interpreted.

Hope this helps?

cheers,
Richard
 

On Wed, 2006-06-07 at 14:02 +0800, mark.schreiber at novartis.com wrote:
> That might be a more elegant solution.
> 
> Could even make the InputStream implement RichSeqIOListener thus it would 
> be sending data to the RichFormat and listening to what the RichFormat 
> makes of the data.
> 
> The InputStreamIOListener could remember when the RichFormat emits a 
> startXXX() event record the line number and start buffering all the data 
> sent as the readLine() requests are made (while also sending it to the 
> RichFormat). When the RichFormat emits the corresponding endXXX() event 
> the buffer can be cleared and the process starts again.
> 
> Only problem might be what to do when the RichFormat consumes data in 
> between emitting events (which is allowed).
> 
> - Mark
> 
> 
> 
> 
> 
> Michael Heuer <heuermh at acm.org>
> Sent by: Michael Heuer <heuermh at shell3.shore.net>
> 06/07/2006 01:51 PM
> 
>  
>         To:     mark.schreiber at novartis.com
>         cc:     biojava-dev at biojava.org
>         Subject:        Re: [Biojava-dev] Proposed change to RichFormat interface
> 
> 
> Mark Schreiber wrote:
> 
> > Hi all -
> >
> > I would like to propose a change  to the RichFormat interface. I think 
> we
> > should do this now as we haven't done a stable biojavax roll out yet so
> > interface
> > changes should still be allowed. The additional methods would be:
> >
> > public String currentLine();
> > public int currentLineNumber();
> >
> > This would make debugging a lot easier, it would also make construction 
> of
> > a RichSeqIOListener that logs and debugs much easier. I was trying to do
> > this a while back. I started a background process that parsed 6GB of
> > genbank records looking for records that failed. It worked ok but would 
> be
> >
> > much better with the ability to query the RichFormat in the above way. 
> We
> > might even be able to make it  a utility that people could run on 
> suspect
> > files and generate standard bug reports to make it easier for us to 
> debug
> > the parser code.
> >
> > What do people think??
> 
> Another possibility would be to leave this sort of progress tracking up
> to the client, in that they could wrap the InputStream in something like
> an CountingInputStream before passing it to the parser(s):
> 
> http://jakarta.apache.org/commons/io/api-release/org/apache/commons/io/input/CountingInputStream.html
> 
>    michael
> 
> 
> 
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416




More information about the biojava-dev mailing list