[DAS2] feature group assembly; proposals for simplification

Erwin, Ed Ed_Erwin at affymetrix.com
Mon Sep 18 16:59:55 UTC 2006


 
I think the simplest solution for parsing is while parsing the file,
read objects F1,F2,F3,F4,F5 into memory but don't even try to hook-up
parents and children yet.  After finished reading the file, and getting
rid of all the memory overhead associated with XML-parsing, loop through
the objects that you've read and link parents to children.  Their order
no longer matters because they are all in memory, probably in a hashmap
linking ID to object.


-----Original Message-----
From: das2-bounces at lists.open-bio.org
[mailto:das2-bounces at lists.open-bio.org] On Behalf Of Andrew Dalke
Sent: Monday, September 11, 2006 6:44 PM
To: DAS/2
Subject: [DAS2] feature group assembly; proposals for simplification

I've been working on a writeback server.  It will verify that the
feature groups have no cycles, that if X is a parent to Y then Y
is a part of X, and that all groups has a single root.

I'm having a hard time with that, and harder than I expected.

GFF3 had only one direction of relationship.  As such it's impossible
to assemble a feature group until the end of the file or the
marker that no lookahead is needed.

We changed that in DAS2.  The feature xml is bidirectional so
in theory it's possible to know when the feature group is complete.
But it's tricky.  It's tricky enough that I want to change things
slightly so that people don't need to handle the trickiness.

The trickiness comes when parsing the list of features into
feature groups.  For example, consider

        [F3]
        /  \
    [F4]   [F2]
      |      |
    [F1]   [F5]

where the features are in the order F1, F2, ... F5.  After F2
the system looks like there are two feature groups.

           [F3?]
             |
    [F4?]  [F2]
      |      |
    [F1]   [F5?]

Only after F3 can those be merged together.  This requires some
non-trivial bookkeeping, unless I've forgotten something simple
from undergraduate data structures.

Of course it's simple if you know that a feature is the last feature
in a feature group either through reaching EOF or a special marker.
But then what's the point of having bidirectional links if the
result is no better than GFF3's only-list-parent solution.

If there is a simple algorithm, please let me know.





More information about the DAS2 mailing list