[Biojava-l] Genbank ASN.1 or XML parser

mark.schreiber at group.novartis.com mark.schreiber at group.novartis.com
Thu Jun 3 21:48:49 EDT 2004


Hi -

The general pattern of input in Biojava is:

BufferedReader -> SequenceFormat -> SeqIOListener -> SequenceIterator.

A specific example would be:

BufferedReader -> FastaFormat -> SequenceBuilderFactory -> 
SequenceIterator


The format object is responsible for parsing the input and generating 
events that the SequenceBuilderFactory listens for and uses to build one 
or more Sequence objects. The SequenceFormat implementation usually draws 
on a SymbolTokenization object which determines how characters are mapped 
to biojava Symbols (eg a is mapped to adenosine).


Basically you make a parser that emits callback events to a registered 
SeqIOListener. If the listener is a SequenceBuilderFactory it generates 
Sequence objects base on those events. It is very similar to the SAX API 
for XML parsing.

Hope this helps,

Mark







"Gang Wu" <gwu at molbio.mgh.harvard.edu>
06/04/2004 12:52 AM
Please respond to gwu

 
        To:     Mark Schreiber/GP/Novartis at PH
        cc:     "Bio-Java" <biojava-l at biojava.org>, 
<biojava-l-bounces at portal.open-bio.org>
        Subject:        RE: [Biojava-l] Genbank ASN.1 or XML parser


Hi,

I would be glad to write the parser. Since I am pretty new to BioJava
project, can anybody give me a guide on how to start? Thanks.

- Gang


-----Original Message-----
From: mark.schreiber at group.novartis.com
[mailto:mark.schreiber at group.novartis.com]
Sent: Wednesday, June 02, 2004 8:59 PM
To: gwu at molbio.mgh.harvard.edu
Cc: Bio-Java; biojava-l-bounces at portal.open-bio.org
Subject: Re: [Biojava-l] Genbank ASN.1 or XML parser


Hello -

At the moment there are no parsers for XEMBL, GenBankXML or ASN.1, They
could both be easily made if someone had that time. GenBankXML could
easily draw on a SAX or DOM parser to pass events to the BioJava
SequenceBuilders (using some kind of adapter). ASN.1 would need a more
custom parser but because it is highly structured that shouldn't be too
hard.

Any volunteers?

- Mark





"Gang Wu" <gwu at molbio.mgh.harvard.edu>
Sent by: biojava-l-bounces at portal.open-bio.org
06/02/2004 11:21 PM
Please respond to gwu


        To:     "Bio-Java" <biojava-l at biojava.org>
        cc:
        Subject:        [Biojava-l] Genbank ASN.1 or XML parser


Hi everyone,

I just tried out the APIs for parsing Genbank format files. Though it
works
well, I still wonder if there are APIs for parsing Genbank files in ASN.1
or
XML formats because the Genbank format was designed for human being and
the
ASN.1 and XML formats should be more reliable for data exchange.

Gang Wu

_______________________________________________
Biojava-l mailing list  -  Biojava-l at biojava.org
http://biojava.org/mailman/listinfo/biojava-l









More information about the Biojava-l mailing list