[Biopython-dev] Martel-0.4 available

Brad Chapman chapmanb at arches.uga.edu
Wed Dec 6 03:50:55 EST 2000


Hi Andrew;
Sorry I haven't had a chance to comment on new Martel features yet
-- I have a bit of feedback in the areas you mentioned based on 
working with it for writing the GenBank parser.

>   New regexp syntax - \R
>      \R    means "\n|\r\n?"
>      [\R]  means "[\n\r]"
> 
>   New Expression Node - AnyEOL
>      implements the \R test

In general, the \R syntax worked great for me. I'm not a regexp purist 
or anything, so I have no issues with adding this. The new feature of
being able to handle any kind of line feed is very nice. One thing
that I ended up doing was not using the AnyEOL test at all, and
instead only using the \R syntax. As I starting using it I realized
why it was so nice to be able to embed the \R inside of any regular
expression, so I ended up only using \R to be consistent (so I used
Martel.Re("\R") to detect end of lines. Just thought I would mention
it if it helpful to you. But in general, \R seems great by me.

I also thought it would be nice if the RecordReader would accept \R as 
a newline as well, so you could do something like
RecordRecorder.EndsWith(handle, "//\R"). Even further along these
lines, it would have been nice to be able to set the end with an
arbitrary regular expression. For GenBank, I would have wanted
"//[\R]+" (okay, I would have to escape those //'s, but I'm not sure
how many /s that would leave me with :-), so that  the end would be
// plus an arbitrary number of newlines. I ran into problems with
files like the biojava genbank test file, where there are a bunch of
linefeeds at the end of the file, but this could be a problem with a
file of cut'n'pasted records that had differing amounts of
linebreaks. I was able to get around this for GenBank by using
StartsWith(handle, "LOCUS"), but just thought I would mention the thought.

>   RecordReaders rewritten to use mxTextTools to find record
> begin and end characters rather than using readline/readlines.

I have a quick question about mxTextTools importing -- you are now
importing with:

from mx import TextTools

When did it get a mx meta-directory? Is this a new version or anything 
fancy? It was no big deal, I was just curious.

>   - how to make an iterator (would like a bit more feedback)

(pausing to read your other mails right now... thanks for the
feedback!)

One thing that I didn't use is a Martel based iterator -- I just stuck 
with the type of iterator that Jeff uses in other Biopython parsers 
but used the RecordReader to implement it. I'm not sure if it could be 
done in a better way with a Martel iterator...

BTW, the debug_level = 2 option on the parser is incredibly nice. It
really helps get at why a parse is failing and makes it much easier to 
correct the problem. I probably would still be pulling my hair out
trying to regexp right without this. Thanks!

Brad





More information about the Biopython-dev mailing list