[Biopython-dev] [Bug 1747] GenBank parser is very slow and memory hungry for large input files

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Wed Mar 9 21:39:47 EST 2005


http://bugzilla.open-bio.org/show_bug.cgi?id=1747





------- Additional Comments From dalke at dalkescientific.com  2005-03-09 21:39 -------
I think the history has shown that the idea of Martel, while interesting, has had problems in its  
implementation.  It could only be fixed with a lot of effort.  Hand-written code to do the same parsing 
doesn't have the purity to it but is easier to maintain, and easier to understand by a wider number of  
people.

I think also that the Martel grammers I developed were too nit-picky and there are places where 
perhaps it should have been a bit looser.

So I have no qualms with getting rid of Martel as the patcher suggests.

>From an email I wrote recently on the topic, included here for the record

Martel hasn't panned out as well as I had hoped.  I think
I know the reasons:
  - regexps are hard to write and debug
      Could be improved with some sort of development/
      testing environment

  - Martel's grammars are hard to edit
      When a grammar changes it's not possible to say "the
      new format is the old format but change this one
      bottom level node".  I'm actually considering
      switching over to a DOM-style description of the
      tree so I can use XSLT as the editing language.
      Except that I think XSLT's grammar is clumsy and ugly.

  - Martel needs everything in memory
      I implemented a hack to parse a record at a time but
      it's a hack and fails (except on large memory machines)
      for people who want to read a chromosome at a time.
      I would also like it to be feed based instead of
      pull based.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


More information about the Biopython-dev mailing list