[Biojava-l] blast xml parser

Bruce Ling xling@tularik.com
Sat, 9 Jun 2001 10:08:51 -0700


I agree with Evan and the following is my suggestion.

1. It would be nice to have object level support of biojava binding to the
event based parsing result (the implentation of blast and hmmer would have a
lot mileage there). This will also serve as a good demos as how to integrate
event based parsing utilities with the rest of the biojava objects for the
end user of the biojava library.

2. I truly think the support of NCBI xml output should be something biojava
developer seriously look into.  We have to face it that NCBI's product will
be standard in this business as long as NCBI owns the data.  Same argument
applies to EMBL.  The implementation would be ideal if SAX based parser of
this format to be bound to the biojava objects. The advantage of SAX parser
versus DOM type is for the performance considerations. The jaxb new
utilities is another option but I am having doubt of its performance.

Thanks.

Bruce Ling, Ph.D.
Tularik, Inc
http://www.tularik.com


-----Original Message-----
From: Ewan Birney [mailto:birney@ebi.ac.uk]
Sent: Saturday, June 09, 2001 2:35 AM
To: xling
Cc: Simon Brocklehurst; 'biojava-l@biojava.org'
Subject: RE: [Biojava-l] blast xml parser




A view from bioperl:


I would claim that you do need layered levels of abstraction. The event
parsing SAX model is something we are very jealous of in bioperl because
we understand its flexibility and the ease of writing seriously
lean-and-mean parsers by experts.

maintaince of parsers is one of our headaches over time.

However we would always maintain the object-level representation of
BLAST/HMMER parsing results - although this is one mapping to in-memory
objects, for at least 50% of use cases this is fine, and we cna guarentee
that the fine details of how this objects interact with the rest of the
package work.

These are ideal for small scale uses and throw-away scripts, as well as
actually being an ok fit with large scale cases (BPLite is, as the name
implies, Lite on the code and parsing). Scaling is mainly achieved by
having on set of objects per result - not ideal but good enough


So I guess both views of "parsing BLAST" are correct. They are just
nested.


ewan

-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>.
-----------------------------------------------------------------