[BioPython] parsing with Martel
Jay Hesselberth
jhessel@mail.utexas.edu
12 May 2002 14:29:01 -0500
I've been playing around with Martel, and have written a couple of
parser that work pretty well. I'm running into a problem, however, in
that the format I'm parsing (from the FASTA alignment program) has a lot
of cruft that I don't want / need parsed. I've written the parser so
that it just sticks tags around the crucial data, but when I actually
parse the file, the unwanted stuff (parentheses, etc.) is screwing up
the parse. I'm not sure exactly why, but if I've got something like:
<tag> DATA </tag>;
where the semicolon at the end is unwanted, the semicolon ends up in a
TEXT node in the parsed xml. I'm a bit confused about this, as I was
(naiively) under the impression that things like xml.sax.ContentHandler
don't care about untagged stuff.
I guess what I would like to do is be able to post-filter the output,
removing everything that remains untagged after converting the file to
xml. Is there a built-in mechanism for this?
Thoughts / Ideas / Suggestions ???
Jay
--
______________________________________________________
Jay Hesselberth
University of Texas 2500 Speedway
MBB 3.424 / A4800 Austin, TX 78712
phone: 512-471-6445 email: jhessel@ellingtonlab.org