[Biojava-l] Half html code taken into xml by BlastLikeToXMLConverter

Jurgen F. Doreleijers jurgen@bmrb.wisc.edu
Mon, 16 Dec 2002 17:30:29 -0600


Hi,

I enjoyed using the biojava.org api for parsing the blast output. I discover
that sometimes there's html markup in the blast output e.g. in entry
ref|NP_012821.1. The problem is that the markup is included when
BlastLikeToXMLConverter converts the data to xml. The markup tags don't
always match because of truncation in the blast output and that invalidates
the xml files.

Question: is there a way to filter the markup out upon parsing? Or do I need
to do a pre-run deleting any markup?

Thanks,
Jurgen

---
Jurgen F. Doreleijers, Ph.D.
CESG/BMRB, Univ. of Wisconsin-Madison, WI, USA
mailto:jurgen@bmrb.wisc.edu