[Biojava-dev] Problem with ranks

Richard Holland holland at ebi.ac.uk
Tue Sep 12 09:36:47 UTC 2006


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> Ranks are never described but the name suggests that they are positive integer, in consecutive order and not identical for similar objects within the same sequence. Here are some questions:

Ranks in general are defined by BioSQL, but as much else in that schema
they are not defined very well and so everyone has their own
interpretation of what should go where.

BioJava uses them in the way which I thought was most logical at the
time, but BioPerl often ignores them completely and populates them all
with zeroes. As BioJava can be connected to a database which could have
been populated by BioPerl, it has to be able to cope with these
different situations and potentially many others.

It would be nice for all the Bio* projects to agree on exactly how to
store various bits of information in BioSQL, especially as to how best
to represent specific file formats such as GenBank, but this is probably
highly unlikely given the limited amount of times when representatives
of all the projects are in the same place at the same time (basically
only at BOSC, and even then not always - there was nobody from BioJava
there this year).

> - Can rank be negative? We would assume not but this is never checked.

Yes. It can be any integer you want.

> - If rank cannot be negative, where do they start, 0, 1? SimpleBioEntryRelationShip suggests that they start at 1 with 0 reserved for absence of ranking.

I tried to start them all from 1, and used 0 for no-rank where rank is
compulsory, and null where rank is optional (see below). If you find
anywhere where I've been inconsistent, please feel free to raise a
Bugzilla bug to point out where I've gone wrong so I can fix them.

> - Are we expecting ranks to be in consecutive order (or in reasonable consecutive order) or values like 1000, 2000, etc. are possible or even expected?

They don't have to be consecutive.

> - Can we have duplicate ranks? We would assume not but SimpleRichFeature javadoc indicates that equal ranks are *acceptable*.

Yes, duplicates are fine.

> SimpleBioEntryRelationship getRank method returns an Integer object, all the other objects return an integer number. Any reason for this?

In BioSQL, BioEntryRelationship has a nullable rank, whereas all other
ranked objects have non-null ranks. Hence I have to use an Integer
object here to be able to cater for the null case, as this cannot be
done with a plain int like the others.

> Moreover 3 of these objects do not have a setRank method: SimpleComment, SimpleRankedCrossRef and SimpleRankedDocRef. How do I insert a comment in the middle of other comments, how do I change the order of these objects without creating new ones?

This is a bug. They should be mutable and fire appropriate change events.

> All these objects have an ordering consistent with equality except SimpleRichFeature. SimpleRichFeature are sorted by rank only. Its compareTo method also never returns 0. A consequence is that removeFeature in ThinRichSequence never works because TreeSet uses compareTo for testing equality.

This is another bug. compareTo, equals and hashCode should always be
working with the same fields. In this case, compareTo is missing a
bunch. It shouldn't be.

A word of warning though - when objects are loaded by Hibernate, often
they are instantiated and added to a set _before_ all the setXXX methods
are called to populate the various fields. Therefore, if you find nulls
in any of the fields required for comparison then you should assume the
object is still incomplete and return a non-zero result, to prevent the
object from accidentally replacing an existing object that matches the
fields populated so far.

> All compareTo methods use rank first except SimpleRankedDocRef which does not use rank at all (but is ranked as its name indicates).

Another bug. It should be using rank as well.

> A few objects are nearly identical when they are equal but not all. SimpleNote compares by rank then by term but not by value. SimpleNotes of same rank and term but different values are nevertheless equal. SimpleRankedDocRef can be equal and have different locations – I can understand this. 

SimpleNote is correct - two notes are equal if they have the same rank
and term.

SimpleRankedDocRef however is incorrect - it should include location in
the equals/compareTo/hashCode methods. Another bug then, but check for
non-null locations during Hibernate loading as above.

If you or Mark can report all these to Bugzilla, then one of us will get
round to fixing them before the end of the beta testing. (Reporting them
to Bugzilla makes a nice todo list which is far more reliable than me
trying to keep track of everything on paper...).

cheers,
Richard

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFBn3z4C5LeMEKA/QRAmU4AJ9TJ5oh7EnUdJNLHryEx3RxNJ0CXwCfe2eY
e8Qww/i+MMBA8sgRJVvV+Z8=
=UURD
-----END PGP SIGNATURE-----



More information about the biojava-dev mailing list