[Biopython-dev] [Bug 3069] More robust feature parser for GenBank/EMBL records

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Thu Apr 29 00:35:11 UTC 2010


http://bugzilla.open-bio.org/show_bug.cgi?id=3069





------- Comment #4 from laserson at mit.edu  2010-04-28 20:35 EST -------
Actually, the record I attached fails, but it's not the worst-case scenario. 
Using the extended feature-key length, there are some keys that actually make
it to the border of the qualifiers, so that they are contiguous.  This means
that the indentation must be hardcoded for IMGT just like anything else.

In order to solve this problem once and for all, is the best approach to
subclass the IndscScanner and put in values that make sense for IMGT?

If so, then there is one more problem that needs to be addressed.  About 80% of
the records in IMGT conform to the EMBL format correctly, while about 20% have
this over-indentation problem.  Would it make more sense to go through the
entire IMGT database and change each record to have the increased indentation? 
Then the subclassed Scanner would have no problem.  The alternative is that for
each record, the amount of indentation should be "discovered" and changed
appropriately for each record.  The parsing would then proceed as it currently
does.

Uri

  This leaves two options:

1) Go through each record in IMGT and enforce the longer indentation for each
such record.  (This shouldn't be too difficult).
2) Su


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list