[Biojava-l] SAX parser demo

David Huen smh1008 at cus.cam.ac.uk
Wed Jun 25 09:35:00 EDT 2003


On Wed, 25 Jun 2003, Russell Smithies wrote:

> Looks good but doesn't do what I need but I don't think it was ever going to
> :-(
> 
> The blast XML data has loads of info in it (I guess thats the reason for the
> format) but I want to be able to get at individual tags, not just hits.  For
> example, some of the stats data (Statistics_entropy, Statistics_eff-space
> etc.) or other hit data (Hsp_align-len, Hsp_pattern-from etc.) instead of
> just hitID and e-value might be useful?
> I guess I'll have to implement some new bits (from
> SimpleSeqSimilaritySearchSubHit?) but not exactly sure where.
> 
Ah, OK.  I have picked up most but not all the fields.

Hsp_align-len is picked up and placed in an alignmentSize attribute.

The others are not but it should not be difficult to parse and stuff them 
into the SAX output stream.  If a suitable fit with the
BlastLikeDataSetCollection.dtd can be achieved it should be possible to
map it over readily.  If not, we will have to extend that appropriately
without breakage.  However, not all the data can be mapped
to the SeqSimilarity stuff so you may have to place a listener to handle
those yourself.

I don't see Hsp_pattern-from in my XML output.  Do you have an output
file with it?  This parser was written by reverse engineering the
semantics from the output ;-).  I seem to recall that the semantics of
orientation was weird.

Regards,
David




More information about the Biojava-l mailing list