[Biojava-l] FeatureHolder.containsFeature()

David Huen smh1008@cus.cam.ac.uk
Thu, 4 Apr 2002 23:34:03 +0100 (BST)


On Fri, 5 Apr 2002, Schreiber, Mark wrote:

> I suppose two features from different sequences/ feature holders could
> produce the same template but like you say there is no easy way around
> this. We could specify that features only look to see if the have the
> same chain of parents, ignoring children. However a Sequence if
> converted to a GenBank (or similar) file looses all its nested feature
> heirachy so when the sequence is read back from the GB file it will have
> the same features but not nescesarrily nested as the original thus if
> parent child relationships are taken into account equality would fail
> here.
> 

Oh dear, this business is more complicated than I had first envisaged.

Perhaps we will have to settle for a solution no worse than current
performance.  With the in-memory FeatureHolder, containsFeature() on a
child feature will not survive a trip thru' the genbank format anyway.
And if you created two Sequences with identical symbols, features,etc,
containsFeature() always return false if a feature on one sequence is
passed to the containsFeature(0 on the other sequence even if
conceptually, they are the "same".

Could I propose that with regard to DB-backed persistent objects,
Feature objects store the unique id of the feature within them and
containsFeature only returns true if the feature of that unique ID is a
child of another feature whose feature unique id places it somewhere on
the direct ancestral lineage en route to the sequence object (yuck, what
an awful sentence!)?

This will survive any number of instances of that feature object and its
parent sequences being checked out of the SequenceDB and this
survivability may become important when we consider what to do with
removeFeature() [ie. here, with multiple instances checked out of a
DB-backed Sequence DB, should removeFeature() remove the feature in all
checked out copies?  This bites with a vengeance if there's more than one
client machine connected to the same DB.  How do different machines know
if the features on a sequence you have checked out have changed?  Should
we even contemplate such an eventuality?]

Regards,
David huen