[Biojava-dev] Problem with ranks
mark.schreiber at novartis.com
mark.schreiber at novartis.com
Tue Sep 12 03:37:55 UTC 2006
Hi George, thanks for raising these issues. We should fix this before
biojava 1.5 finishes it's beta testing. See my responses below. Richard
Holland and David Scott will no doubt have comments too.
>I am having difficulties to use ranking with some objects found in
SimpleRichSequence. There are 6 objects >contained in SimpleRichSequence
which are found within collections, namely SimpleComment,
SimpleRankedCrossRef, >SimpleRankedDocRef, SimpleNote,
SimpleBioEntryRelationShip, and SimpleRichFeature. Each of them is
associated with >a TreeSet and uses to some extend ranking for comparison.
>
>Ranks are never described but the name suggests that they are positive
integer, in consecutive order and not >identical for similar objects
within the same sequence. Here are some questions:
Ranks actually come from the BioSQL schema. They are used so that lists of
features, comments etc that are stored in database tables (or any other
collection) can be reassembled in the same order that they are found in
the original flatfile (Genbank etc). Simply put they are used to preserve
order.
> - Can rank be negative? We would assume not but this is never checked.
I suppose it could be but it would make no sense given the above
description. We should probably document this in the javadocs and suggest
that classes enforce the non-negative rule.
- If rank cannot be negative, where do they start, 0, 1?
SimpleBioEntryRelationShip suggests that they start at 1 with 0 reserved
for absence of ranking.
At the moment this strictly depends on the creating object. Typically this
would be a RichSequenceFormat implementation. The Genbank format appears
to start numbering from either 0 or 1 (for comments). There should be a
common rule.
>- Are we expecting ranks to be in consecutive order (or in reasonable
consecutive order) or values like 1000, >2000, etc. are possible or even
expected?
Is there any reason why we need to enforce this rule? It would be tidier
but it would be a pain to have to re-order everything just because one
object is deleted. The genbank parser currently numbers sequentially.
>- Can we have duplicate ranks? We would assume not but SimpleRichFeature
javadoc indicates that equal ranks are >*acceptable*.
Certainly all the RankedCrossRefs returned by the Genbank parser have the
same rank (0). It is possible as long as the objects are somehow unique.
If equals() is true then the objects are overwritten. I don't think any
Ranked object currently relies only on rank for equality (or for the
compare() method either). The Unit tests do a pretty good job of testing
equals and compare and making sure they return logically equivalent
values. Although it is possible it may not be desirable. Any thoughts?
>SimpleBioEntryRelationship getRank method returns an Integer object, all
the other objects return an integer >number. Any reason for this?
I think Richard has a reason. Something to do with Hibernate?? Richard??
>Moreover 3 of these objects do not have a setRank method: SimpleComment,
SimpleRankedCrossRef and >SimpleRankedDocRef. How do I insert a comment in
the middle of other comments, how do I change the order of these >objects
without creating new ones?
Possibly they should. Making things mutable is always tricky but the other
objects with setRank methods register change listeners and have the option
of vetoing the change so it can be done safely. The ChangeListener could
be in charge of re-ordering ranks if you insert into the middle.
>All these objects have an ordering consistent with equality except
SimpleRichFeature. SimpleRichFeature are sorted >by rank only. Its
compareTo method also never returns 0. A consequence is that removeFeature
in ThinRichSequence >never works because TreeSet uses compareTo for
testing equality.
OK, that sounds like a bug that we have missed in the Unit tests. I will
report it to bugzilla and fix it when I have time.
>All compareTo methods use rank first except SimpleRankedDocRef which does
not use rank at all (but is ranked as >its name indicates).
We should change this. Another bugzilla report.
>A few objects are nearly identical when they are equal but not all.
SimpleNote compares by rank then by term but >not by value. SimpleNotes of
same rank and term but different values are nevertheless equal.
SimpleRankedDocRef >can be equal and have different locations ? I can
understand this.
This is because the term of a SimpleNote is an ontology term and should
therefore have only one value. Two Notes with the same term are therefore
the same (or should be). For example if the term or keyword of the Note is
Organism: there should only be one of these Notes.
>We need a clear definition of what ranks are, what the ordering they
imply is intended for and how to deal with >duplicate ranks? Maybe we
could have an interface that encapsulates the concept of ranking, e.g.
interface Ranked, >methods setRank() and getRank()) and all these
information grouped in the javadoc. It seems easier to derive >exceptions
from a common pattern that the opposite. Maybe we also need separate
comparators when they are not >consistent with equal.
I think we should have a 'Ranked' interface with clear rules in the
javadoc. I can't think of any good reason why comparable and equal should
not be consistent. We should try and keep them the same as much as
possible.
- Mark
More information about the biojava-dev
mailing list