[Biopython-dev] Martel changes

Jeffrey Chang jchang at smi.stanford.edu
Wed Dec 12 13:07:27 EST 2001


On Wed, Dec 12, 2001 at 03:21:59AM -0700, Andrew Dalke wrote:
> Is anyone using the iterator facility in Martel?

Yes.  I'm using it in Bio/Medline/NLMMedlineXML to parse the
XML-formatted PubMed records.  Each XML file contains about ~30000
records and is too big to keep in memory at once.

> I would like to change the API.  Currently you pass
> it the factory function which produces SAX handlers.
> I would rather just pass it a SAX handler, and
> trust the handler to reset itself properly with the
> startDocument/endDocument methods.  (Those which
> don't can easily be wrapped.)
> 
> The problem with the current API is when the handler
> needs parameters then you need to create something
> which passed those parameters to each instance.  It's
> ugly, and it's common... I think.  I also don't like
> that the object is created for every record instead
> of reusing the existing one.

Sure.  Let me know if you do it, so that I can update my files
accordingly.  I don't think it'll be hard to handle what you describe.

> Has anyone started building up a collection of those
> common patterns?  I've got Integer, SignedInteger, Float,
> Word, and Whitespace.  I'll probably add Spaces (for
> only " "), NonSpaces (up to a " ").

Sounds good.  Looking through my code, other ones I use are Digits
(more general name for Integer), Punctuation,
and Unprintable(AnyBut(string.printable)).

Actually, could you make more general equivalents of some of the
names?  For example, presumably Digits and Integer would match the
same things, but a lot of times you want to match some numerical
characters and calling it an integer might be a tad confusing...

Jeff



More information about the Biopython-dev mailing list