[Biopython-dev] rebase

Jeffrey Chang jchang at SMI.Stanford.EDU
Mon Jul 31 18:41:31 EDT 2000


[Jeff, saying that biopython has tools to strip HTML]

[Cayte]
>    The consumer decorator doesn't solve the problem, because it occurs in the
> _Scanner.

I'm not sure what problem you're trying to solve...


>    SGML Handle works, except the linefeeds are placed in such a way,
> that there may be no separation between a key word and data from a
> previous field.  

Ah, yes, that's true.  If you have text fields that are separated only by
HTML tags, then it's insufficient to just strip the tags because then
you'd have no separator.

> As an experiment, I hacked handle_data in a copy of File.py and I was
> able to solve the problem.  But to do it cleanly in production code, I
> would need to be able to be able to pass my own parser to
> SGMLStripper, as an optional parameter. The .  alternative would be to
> subclass both SGMLStripper and /SGMLHandle, because hamdle_data is
> deeply buried in these classes.

Allowing an optional parser parameter for the SGMLStripper seems like the
way to go.  I'll fix the File.py file.

Jeff




More information about the Biopython-dev mailing list