[Biojava-l] status? Blast parser?

Matthew Pocock mrp@sanger.ac.uk
Wed, 16 Feb 2000 10:27:44 +0000


Hi.

Simon Brocklehurst wrote:

> Dear All,
>
> Having previously posted that CAT wouldn't be contributing to biojava,
> I'm happy to say that we now have the OK to contribute code that we
> regard as precompetetive.  Parsers for things like Blast and Hmmer
> certainly fall into that category.

This is great.

>
>
> We'd be happy contribute this functionality to biojava (our systems
> extract essentially all the information from the output of these type of
> common bioinformatics software).  Before we release any code, however ,
> I'd like to see some feedback on the list about:
>
> a) Do people actually want parsers for Blast, Hmmer etc?
>

I think everybody will find these parsers useful - this is currently a great
strength of bioperl.

>
> If the answer is yes:
>
> b) What do people want to use them for - we'd like to make them work out
> of the box for people if possible.  What we could go with pretty much
> right away (subject to finding the time to package it all up in a
> suitable form for biojava) are classes for getting all the information
> out that you can use in your own systems, and classes designed to
> produce marked up HTML etc.
>
> It would be good to have some use cases from people out there (Jared?)
>

My personal bias for these parsers is to build a model like the XML sax/dom
division of labour. So - a blast parser would fire events for hits or
alignments & stuff, and simple apps can just listen for the items they want
(blast version and top 10 hits for example), or you could create a blast
document object from this event stream, and then manipulate the whole
report. The event layer might not use any biojava concepts at all, where as
the blast document would be composed from a SequenceDB object, sequence
objects, locations, features & loads of other data-structures. Of course,
you may have done all of this in a completely different way...

Do you have a biojava account? If not, contact Ewan at
mailto:birney@ebi.ac.uk and he can set you up. I would love to have a gander
at your current code to get a feel for how much tweaking we are both going
to have to do.

Matthew

>
> Simon
> --
> Simon M. Brocklehurst, Ph.D.
> Head of Bioinformatics
> Cambridge Antibody Technology
> The Science Park, Melbourn, Cambridgeshire, UK
> http://www.CambridgeAntibody.com/
> mailto:simon.brocklehurst@CambridgeAntibody.com
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l