[Biopython-dev] Martel-0.5

Andrew Dalke dalke at acm.org
Tue Jan 9 00:02:32 EST 2001


Cayte <katel at worldpath.net>:
>   Does the current version of Martel support backtracking?

Sadly, no more than it ever did.  There is no backtracking
with the "*" operator.  I haven't been clever enough in how
to use mxTextTools to support that ability.  But so far there
have been ways around it.

>  The parser gets stuck on this line:
>UniGene Cluster Hs.222015
>
>  The expression is:
>unigene_title = Martel.Group( "unigene_title", Martel.Str(
> "UniGene Cluster " ) +
>    Martel.Re( "[A-Z]" ) + Martel.Re( "[a-z]" ) + Martel.Re( "\.\d+" ) +
>    Martel.AnyEol() )
>
>  After this it goes into a loop until it runs out of characters.

I can't see why it would do that there.  Every operation must
consume at least a character so it can't be stuck in an infinite
loop.  The only operator to consume newlines is the AnyEol so
at most it should read up until the end of a line.

Have you tried using the make_parser(debug_level = 2) option to
see which operation is consuming characters?

Also, you can merge the Re operations into one, as in
  Martel.Re(r"[A-Z][a-z]\.\d+") + Martel.AnyEol()

or even use \R at the end of the pattern to replace the AnyEol.

I just tested your expression out and it seems to work fine for
me.  Here's what I did:

>>> import Martel
>>> unigene_title = Martel.Group( "unigene_title",
Martel.Str( "UniGene Cluster ") + Martel.Re( "[A-Z]" ) +
Martel.Re( "[a-z]" ) + Martel.Re( "\.\d+" ) + Martel.AnyEol())
>>> parser = unigene_title.make_parser()
>>> from Martel.test import support
>>> parser.setContentHandler(support.Dump())
>>> parser.parseString("UniGene Cluster Hs.222015\n")
-------> Start
<unigene_title>UniGene Cluster Hs.222015
</unigene_title>
-------> End

If you still can't get it working, email me what you have and
I'll take a closer look at it.

                    Andrew





More information about the Biopython-dev mailing list