[Biopython-dev] swissprot parsing performance comparisons

Wed Jan 10 13:16:46 EST 2001

Jeff:
>What about PySAT?
>http://www.embl-heidelberg.de/~chenna/PySAT/

Thanks for the reminder.  I have the distribution, I'll
test it out was well.

>They have support for SwissProt, and their toolkit has been
>published.  IIRC, theirs is an example of a less stringent python
>implementation of a parser.

I recall looking at their code and I agree.  It is more like
the Swissknife way of doing things.

>This is an interesting statistic, and surprises me.  I wonder what's
>slowing the perl parser, then, since it doesn't use callbacks?

The implementions do a lot of small regex parsing.  Martel does
it all at once, and at the C level.  That might be the difference.
It is hard to tell since I would need to better understand the
details of the perl implementations.

>That's embarrassing, since it's supposedly been checked against it!  How
>many entries in release 38?  Perhaps I need to update mine.

I don't know.  I ran it and it failed at a record.  I figured out
what was wrong with that record, changed the 1 to 0, and then
everything parsed fine.

>It does seem to match the philosophies of the languages...

True enough, although as you mentioned PySAT is less stringent
and more like the Perl implementations.  Something to ponder :)

                    Andrew
                    dalke at acm.org