[Biojava-l] Blast-xml parser

Wiepert, Mathieu Wiepert.Mathieu@mayo.edu
Wed, 6 Mar 2002 10:48:11 -0600


Hi,

As an FYI, people should note that at this time, the -F option of blastall
(filter output) for -m 7 option (XML output) does not work, it defaults to
-F False.  Not a big problem, but I was confused for a bit when my normal
output and xml output were giving me different results.  The default output
with -F false made the results the same.

Thankfully, NCBI confirmed this for me and is looking into it.

-Mat


-----Original Message-----
From: Jason Stajich [mailto:jason@cgt.mc.duke.edu]
Sent: Wednesday, March 06, 2002 10:41 AM
To: edda.koopmann.ek@bayer-ag.de
Cc: biojava-l@biojava.org; Mathieu_Wiepert.Mathieu@mayo.edu
Subject: Re: [Biojava-l] Blast-xml parser


We have a soln for this in bioperl, consider this script in bioperl (using
the live cvs code or 1.0alpha2-rc this weekend)

#!/usr/bin/perl -w
use strict;

use Bio::SearchIO;
use Bio::SearchIO::Writer::HTMLResultWriter;

my $in = new Bio::SearchIO(-format => 'blastxml',
                           -file   => shift @ARGV);

my $writer = new Bio::SearchIO::Writer::HTMLResultWriter();
my $out = new Bio::SearchIO(-writer => $writer);
$out->write_result($in->next_result);

---

run like this
% perl htmlwriter.pl file.xml > file.html


On Wed, 6 Mar 2002 edda.koopmann.ek@bayer-ag.de wrote:

> Hi, there,
> I saw your mail, while looking desperately for a possibility to convert
blast
> output in xml format back to simple text output for simple biologists like
me.
> Any help at any point from anybody? That would be great!
>
> Thanks a lot!
>
> Best wishes
>
> Edda
>
>
>
>
>
>
****************************************************************************
*********
> Wiepert, Mathieu Wiepert.Mathieu@mayo.edu
> Fri, 8 Jun 2001 07:35:26 -0500
>
>      Previous message: [Biojava-l] blast xml parser
>      Next message: [Biojava-l] LocationTools + Decoratorated Locations = ?
>      Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
>
>
>
> My 2 cents...
>
> Thank you for pointing out jaxb, that looks like just what I need at the
> moment :)
>
> In regards to your other comments, I ditto Simon on the use of the SAX
> framework.  Saved me tons of time.  When the Biojava SAX components were
> first written, I believe there was no XML format for BLAST outputs from
any
> program.  When I was adding a little functionality, XML just came to NCBI
as
> I was doing it, and GCG didn't have it yet.  Now that these things exist,
> you may not even need the Biojava SAX parser if you are comfortable with
> XSLT.   The uses I saw with parsing BLAST was to get interesting bits from
a
> file to build a datamining tool.  I saw my possibilities for dealing with
> Blast output as, among other things,
> - a content handler in java with Biojava SAX2 compliant parser and text
> Blast file
> - a content handler in java with SAX2 compliant parser and XML Blast file
> - a stylesheet in java with XALAN XSLT processor
> - standalone XSLT processor like Saxon against text Blast files with
Biojava
> SAX parser plugged in
> - standalone XSLT processor like Saxon against XML BLAST files.
>
> This list is not exhaustive, I am sure, and there are different reasons
> people might want to use them.  One reason to go with plain SAX rather
than
> XSLT, as Simon has pointed out to me before, is if you have very large
blast
> files (and I do), using XSLT is not great.  It usually tries to
instantiate
> your whole document in memory.  A sax parser is then just the trick.
There
> are ways around this, but I have not explored them.
>
> I can certainly see possibilities to take blast output (in either form,
text
> or XML), and constitute Biojava objects with direct binding, using jaxb,
if
> that is what it can do.  Al the java solutions above could use that quite
> nicely.  So, who wants to volunteer to look into this? :)
>
>
> -mat
>
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
>

-- 
Jason Stajich
Duke University
jason@cgt.mc.duke.edu

_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l