[Biojava-l] Genbank ASN.1 or XML parser

mark.schreiber at group.novartis.com mark.schreiber at group.novartis.com
Thu Jun 3 21:48:49 EDT 2004

Hi -

The general pattern of input in Biojava is:

BufferedReader -> SequenceFormat -> SeqIOListener -> SequenceIterator.

A specific example would be:

BufferedReader -> FastaFormat -> SequenceBuilderFactory -> 

The format object is responsible for parsing the input and generating 
events that the SequenceBuilderFactory listens for and uses to build one 
or more Sequence objects. The SequenceFormat implementation usually draws 
on a SymbolTokenization object which determines how characters are mapped 
to biojava Symbols (eg a is mapped to adenosine).

Basically you make a parser that emits callback events to a registered 
SeqIOListener. If the listener is a SequenceBuilderFactory it generates 
Sequence objects base on those events. It is very similar to the SAX API 
for XML parsing.

Hope this helps,


"Gang Wu" <gwu at molbio.mgh.harvard.edu>
06/04/2004 12:52 AM
Please respond to gwu

        To:     Mark Schreiber/GP/Novartis at PH
        cc:     "Bio-Java" <biojava-l at biojava.org>, 
<biojava-l-bounces at portal.open-bio.org>
        Subject:        RE: [Biojava-l] Genbank ASN.1 or XML parser


I would be glad to write the parser. Since I am pretty new to BioJava
project, can anybody give me a guide on how to start? Thanks.

- Gang

-----Original Message-----
From: mark.schreiber at group.novartis.com
[mailto:mark.schreiber at group.novartis.com]
Sent: Wednesday, June 02, 2004 8:59 PM
To: gwu at molbio.mgh.harvard.edu
Cc: Bio-Java; biojava-l-bounces at portal.open-bio.org
Subject: Re: [Biojava-l] Genbank ASN.1 or XML parser

Hello -

At the moment there are no parsers for XEMBL, GenBankXML or ASN.1, They
could both be easily made if someone had that time. GenBankXML could
easily draw on a SAX or DOM parser to pass events to the BioJava
SequenceBuilders (using some kind of adapter). ASN.1 would need a more
custom parser but because it is highly structured that shouldn't be too

Any volunteers?

- Mark

"Gang Wu" <gwu at molbio.mgh.harvard.edu>
Sent by: biojava-l-bounces at portal.open-bio.org
06/02/2004 11:21 PM
Please respond to gwu

        To:     "Bio-Java" <biojava-l at biojava.org>
        Subject:        [Biojava-l] Genbank ASN.1 or XML parser

Hi everyone,

I just tried out the APIs for parsing Genbank format files. Though it
well, I still wonder if there are APIs for parsing Genbank files in ASN.1
XML formats because the Genbank format was designed for human being and
ASN.1 and XML formats should be more reliable for data exchange.

Gang Wu

Biojava-l mailing list  -  Biojava-l at biojava.org

More information about the Biojava-l mailing list