[Biojava-dev] Annotation conversions
Keith James
kdj at sanger.ac.uk
Tue Dec 16 07:10:36 EST 2003
>>>>> "Len" == Len Trigg <len at reeltwo.com> writes:
Len> Hi folks,
Len> I'd like to add support for BioSQL's comment table to our
Len> binding, and am wondering about the best way to do it. It is
Len> obvious that the comments should just be annotations, my
Len> question is more about what key I should associate the
Len> comments with (and should look for when storing sequences in
Len> the database).
Len> It seems that I could use either "CC" or "COMMENT" and the
Len> right thing would happen most of the time. However, it seems
Len> a bit silly to have to check for both types when persisting
Len> comments to the database. Should there be a "canonical" key
Len> that is used for comments. Then different I/O formats could
Len> just check for this one key, rather than having to do do
Len> things like this:
Len> GenbankFileFormer.java:322 else if (key.equals("CC") ||
Len> key.equals("COMMENT")) { ccb = new
Len> StringBuffer(sequenceBufferCreator("COMMENT ", value)); }
Yeah - what we have now is really ugly, but also a maintenance
headache.
Len> The same also applies to any of the annotations that are
Len> shared between multiple output formats...
Len> Suggestions?
Does anyone know of practical standards for this sort of biological
metadata? It's that semantic heterogeneity problem again. Some of
these fields are present in many formats e.g. SwissProt, PDB, BSML
(but not always meaning the same thing). I wish there were a way of
finding which fields mean the same thing in different datasets.
I've been googling and looking on pubmed, but haven't seen anything
immediately helpful (i.e. simple, practical and applicable). I don't
think we should over-engineer. "Canonical" keys would be one way. I
would favour typed enums rather than, say, ints or strings, possibly
with a suitable toString() for UI presentation. There should also be a
method to get a definition (or a ResouceBundle of definitions) so we
don't slip right back into the problem of semantics.
It would be great if the sets of "keys" and the annotation builder
itself were plugins. Then we could switch from flat lists to
ontologies without disturbing things too much. Ontologies would be
great, but pragmatically not a runner just yet.
My 2c.
Keith
--
- Keith James <kdj at sanger.ac.uk> Microarray Facility, Team 65 -
- The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK -
More information about the biojava-dev
mailing list