[Biojava-l] ACE parser

Richard Holland holland at ebi.ac.uk
Fri Jul 13 07:34:01 UTC 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Jon!

There is still no ACE parser in BioJava that I know about, so a new
parser would be most welcome. Thanks for volunteering!

The way we write parsers is to split the task into various stages:

    xxx : some BioJava object that can represent all the data in the
file (e.g. Sequence, or ABIChromatogram).

    xxxFormat : actually reads the file, accepts an xxxListener as a
parameter whilst doing so and signals events to that listener as it
processes various parts of the file. Also has a method for writing a new
file based on some existing xxx object. The xxxFormat input parts always
work from InputStreams, with convenience methods that accept Files (or
sometimes even URLs) and delegate to the main InputStream methods. Same
goes for the output parts - OutputStream by default, with appropriate
File/URL/etc. convenience methods.

    xxxListener : listens for 'events' - this is an interface (e.g.
startNewSequence(), addSequenceChunk(), startFeature(), addLocation(),
endSequence(), etc.).

    xxxBuilder : implements xxxListener and has an extra method to
retrieve an xxx object containing all the data it has received so far
(for instance, the builders that listen for events from sequence files
build Sequence objects).

The idea is that the xxxBuilder object will build a complete object with
as much relevant data from the file as possible, but if you don't want
that much information you can pass in your own xxxListener
implementation to xxxParser which only listens to events representing
bits of the file it is interested in. There is usually a default
xxxListener implementation for every xxxListener interface with empty
methods that ignore everything, which xxxBuilder or your own custom
implementation then extends, overriding the methods which supply the
data that it wants.

cheers,
Richard

Warren, Jonathan wrote:
> Hi
> 
> I've seen posts related to people writing an ace file format parser
> (contig assembly output type
> http://bioportal.cgb.indiana.edu/docs/tools/cap3/aceform) but as yet I
> believe there is  not one available in biojava?
> 
> I am thinking of writing one and contributing it to biojava.
> 
> Thinking about the design of it - has anyone got any advice or pointers?
> If I want to hide the data and mechanics from users I don't want to give
> access to all the data it gathers - but not knowing how people are going
> to use it implies that maybe I should give a lot of access to the data??
> 
>  
> Cheers
> 
> Jonathan.
> 
> 
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGlyrp4C5LeMEKA/QRArAsAKCZIOPFSpXv5a8WqtY3zE5caJpk4gCfSBLC
AW3L7kAWOFmEQ3zRN467qhA=
=qX7u
-----END PGP SIGNATURE-----



More information about the Biojava-l mailing list