[Biopython-dev] Martel changes

Andrew Dalke adalke at mindspring.com
Fri Dec 14 07:22:18 EST 2001


Jeff:
>Oops, I just looked over the code.  I'm in fact not using the
>iterator, but thre RecordReader.  Sorry about the confusion!

No problem, and fewer changes for you!

Me:
>> When do you use Unprintable?  When do you use Punctuation?

>I use them both for matching things in english text.  Sometimes the
>text contains unprintable characters from foreign character sets.

Okay, if you say it's useful, I'll add it.  What do you
define as punctuation?

>> My 'Float' isn't very powerful, as it only understands
>> numbers of the form (with optional +/-)

>It gets pretty complicated, e.g.
>1.315E2.24

That's not a valid floating point number -- the exponent must
be an integer.

BTW, I'm working on a 'Time' submodule, which should make it
easier to parse time and date data structures.  The language
I used is based on strptime, plus some experimental extensions
to make it easier for me to use.

The idea is to make it easier to parse something like
  1970-08-22
using a pattern like
  %(4-year)-%m-%d
than having to write
  (?P<year>\d{4})-(?P<month>\d{2})-(?<day>\d{2})
all the time.

(Plus, the patterns I use are stricter, in that you can't
use a day like "43".)



For example, (with judicious newlines for clarity)

  >>> from Martel import Time
  >>> print Time.make_pattern("%m/%d/%Y")
  (?P<month?type=numeric>(0[0-9]|1[012]))/
  (?P<day?type=numeric>(0[1-9]|[12][0-9]|3[01]))/
  (?P<year?type=long>\d{4})
  >>>

  >>> parser = Time.make_expression("%(Jan) %(year)\n").make_parser()
  >>> from xml.sax import saxutils
  >>> parser.setContentHandler(saxutils.XMLGenerator())
  >>> parser.parseString("Dec 2001\n")
  <?xml version="1.0" encoding="iso-8859-1"?>
  <month type="short">Dec</month> <year type="any">2001</year>
  >>>

It's nearly done - only about an hour of work left.  Then
to add the useful patterns, and the SimpleFields (or whatever
I decide to call it).  I should be able to finish it by
Friday .. today.

The code is temporarily at
  http://www.biopython.org/~dalke/Time.py

but it uses a new 'NullOp' Expression not yet in CVS for
doing the 'make_expression' function.

                    Andrew
                    dalke at dalkescientific.com





More information about the Biopython-dev mailing list