[Biojava-dev] biojava3 BLAST parser

Deniz Koellhofer deniz.koellhofer at cambia.org
Wed Sep 1 06:21:52 UTC 2010


Hi Scooter,

I'm currently parsing the BLAST results into plain data containers, but
wouldn't mind integrating it more with existing BioJava3 modules if I find
some time. Pretty busy at the moment, but I will let you guys know if I get
any further.

Cheers,
Deniz

On Wed, Sep 1, 2010 at 12:11 AM, Scooter Willis <HWillis at scripps.edu> wrote:

> Deniz
>
> It would be great to formalize the XML blast results as Java classes. Do
> you have any interest in taking on the project?
>
> Capturing the blast alignment using the new alignment classes would be a
> very nice feature. I like using XPATH as the query language to select for
> hits of interest which should allow for a SAX based approach to minimize the
> impact of very large XML files. XPATH and SAX does appear to have some
> constraints (
> http://stackoverflow.com/questions/1863250/is-it-there-any-xpath-processor-for-sax-model
> )
>
> Probably makes sense to have a Blast module that would depend on core and
> alignment.
>
> Thanks
>
> Scooter
>
>
>
> On Aug 31, 2010, at 8:49 AM, Deniz Koellhofer wrote:
>
> *Hi Scooter,*
> *
> *
> *Thanks for the reply. I guess the BlastXMLQuery is a good example to show
> how to quickly extract information from a BLAST result. *
> *
> *
> *But in my opinion biojava3 should alo have a Blast parser that generates
> java beans containing the complete Blast result set - similar to what
> biojava1.7.1 was doing. So yeah, I'm after translating the XML elements to
> Java classes.*
> *
> *
> *Would something like that fit into one of the biojava3 modules? homology,
> I/O?*
> *
> *
> *Thanks,*
> *Deniz*
> *
> *
> On Tue, Aug 31, 2010 at 8:43 PM, Scooter Willis <HWillis at scripps.edu>wrote:
>
>> Deniz
>>
>> Can you provide some requirements regarding parsing the Blast XML. I tend
>> to use XPATH and the DOM object to get to the data elements of interest so
>> you already have the ability to load the Blast XML and work with the data.
>> The difficulty of "parsing" is not an issue with XML. The BlastXMLQuery is
>> an example of searching the Blast XML to get results. Are you wanting the
>> XML elements translated to Java classes?
>>
>> Thanks
>>
>> Scooter
>>
>> On Aug 31, 2010, at 2:46 AM, Deniz Koellhofer wrote:
>>
>> > Hi,
>> >
>> > I wanted to find out the current state of blast parsing efforts in
>> biojava3
>> > - especially for ncbi blastxml output?
>> >
>> > I had a quick look and found some DOM based code fragments
>> > in org.biojava3.genome.query.BlastXMLQuery. Is there already anybody
>> working
>> > on a more comprehensive SAX parser?
>> >
>> > The biojava1.7.1 blastxml parser seems to work fine, however some of the
>> > tags in NCBI-BLASTN 2.2.23+ output like Hsp_midline, BlastOutput_param
>> don't
>> > seem to get parsed properly
>> > in org.biojava.bio.program.sax.blastxml.BlastXMLParserFacade.
>> >
>> > Cheers,
>> > Deniz
>> >
>> > --
>> > Deniz Koellhofer
>> > Cambia
>> > Initiative for Open Innovation (IOI)
>> > Cambia at QUT, G301, 2 George Street, Brisbane Qld 4000, Australia
>> > _______________________________________________
>> > biojava-dev mailing list
>> > biojava-dev at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>>
>
>
> --
> --
> Deniz Koellhofer
> Cambia
> Initiative for Open Innovation (IOI)
> Cambia at QUT, G301, 2 George Street, Brisbane Qld 4000, Australia
>
>
>



More information about the biojava-dev mailing list