[Biojava-l] blast xml parser

xling xling@tularik.com
Fri, 8 Jun 2001 22:21:25 -0700


comments see below:


>> Compare to bioperl, the demo code is just a "proof of concept" rather
than
>> the implementation library can be of real use.  Correct me if I am wrong.

>You are wrong.   The biojava Blast-like parsing package is scalable ways
that
>the bioperl library is not.   I think you have missed the whole point of
SAX,
>which is to provide an event-based parsing framework.

>So the point is, yes of cource you *need* objects.  But not everyone needs
the
>*same* objects.  There exists a bunch of objects in biojava designed for
the
>purpose of holding results from searches, so if you don't have special
needs of
>your own, then you could probably use these (Keith has used these to hold
the
>results from FASTA searches).

I have to say I disagree with you. The whole point here is not to discuss
whether event-based parsing is a good thing or not. But rather how to
abstract away the implementation details such that users of biojava library
can just send message to the biojava objects to obtain information rather
than to mess with implementation details. Yes, I NEED objects! Despite the
fact not everyone needs the same objects, but you can more or less imagine
what a user might need from blast or hmmer result object instances if
biojava supports that.  XML and objects binding have to be side by side.
Unfortunately biojava doesnot support the binding of the parsing utility to
the rest of the biojava objects.  Personally I don't think it is a good idea
to push this to the user's responsibility and call that a flexible option.
A result file is a result file and the library user should not really need
to worry about how you implement the parsing detail (event based or not
should not really be a concern here).  All they care is what is the easiest
way to retrieve the information to pass to other objects in the program.  I
think the object perl in bioperl did a terrific job as all I need to do is
instantiate a blast object from the result file and obtain information
starting by sending messages to that objects.  All the implementation
details are abstracted  away to APIs. Whether bioperl code is scaleble is a
question maybe some audience who is on both mailing list can address.  As
far as I can see, the scalability really depends on how many different types
of file format you need to support. The most common formats are only a few.
I guess if a bioinformatics library does not support an easy blast/hmmer
parsing and objects binding then it should really reconsider its strategy
and design.

Again I want to clarify the purpose of this email.  I have no doubt that
event based parsing implementation by CAT was or still is contributing to
this open source distribution.  But I think a better design to abstract away
implementation details and the implementation of the binding of the parsing
utility to the rest of the biojava objects is important in order to broaden
the biojava audience and make their life an easy one.

Thanks.

Bruce Ling, Ph.D.
Tularik, Inc
http://www.tularik.com