[Biojava-l] Blast-xml parser

edda.koopmann.ek@bayer-ag.de edda.koopmann.ek@bayer-ag.de
Wed, 6 Mar 2002 16:49:38 +0100


Hi, there,
I saw your mail, while looking desperately for a possibility to convert blast
output in xml format back to simple text output for simple biologists like me.
Any help at any point from anybody? That would be great!

Thanks a lot!

Best wishes

Edda





*************************************************************************************
Wiepert, Mathieu Wiepert.Mathieu@mayo.edu
Fri, 8 Jun 2001 07:35:26 -0500

     Previous message: [Biojava-l] blast xml parser
     Next message: [Biojava-l] LocationTools + Decoratorated Locations = ?
     Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]



My 2 cents...

Thank you for pointing out jaxb, that looks like just what I need at the
moment :)

In regards to your other comments, I ditto Simon on the use of the SAX
framework.  Saved me tons of time.  When the Biojava SAX components were
first written, I believe there was no XML format for BLAST outputs from any
program.  When I was adding a little functionality, XML just came to NCBI as
I was doing it, and GCG didn't have it yet.  Now that these things exist,
you may not even need the Biojava SAX parser if you are comfortable with
XSLT.   The uses I saw with parsing BLAST was to get interesting bits from a
file to build a datamining tool.  I saw my possibilities for dealing with
Blast output as, among other things,
- a content handler in java with Biojava SAX2 compliant parser and text
Blast file
- a content handler in java with SAX2 compliant parser and XML Blast file
- a stylesheet in java with XALAN XSLT processor
- standalone XSLT processor like Saxon against text Blast files with Biojava
SAX parser plugged in
- standalone XSLT processor like Saxon against XML BLAST files.

This list is not exhaustive, I am sure, and there are different reasons
people might want to use them.  One reason to go with plain SAX rather than
XSLT, as Simon has pointed out to me before, is if you have very large blast
files (and I do), using XSLT is not great.  It usually tries to instantiate
your whole document in memory.  A sax parser is then just the trick.  There
are ways around this, but I have not explored them.

I can certainly see possibilities to take blast output (in either form, text
or XML), and constitute Biojava objects with direct binding, using jaxb, if
that is what it can do.  Al the java solutions above could use that quite
nicely.  So, who wants to volunteer to look into this? :)


-mat