[Bioperl-l] Annotation structure

Hilmar Lapp hilmarl@yahoo.com
Sat, 11 Aug 2001 23:00:22 +0200


Ewan Birney wrote:
> 
> [cc'ing matt and thomas in because I want to understand their design
> decision in biojava]
> 
> As mentioned at BOSC, I want to overhaul the annotation
> structure. 
> 
> The proposal is to head towards more of a generic tag => list of values
> scheme which will (a) extend better (b) plays well with biojava and
> biocorba much better. My current proposal is this
> 
> Bio::Annotation moves to Bio::AnnotationCollection.
> 
> Bio::AnnotableI (direct copy from biojava) defines the method
> 
>    $obj->annotation();
> 
> which gives back a Bio::AnnotationCollection
> 
> Bio::AnnotationCollection is:
> 
> =head1 NAME
> 
> Bio::AnnotationCollectionI - Interface for annotation collections
> 
> =head1 SYNOPSIS
> 
>    # get an AnnotationCollectionI somehow, eg
> 
>    $ac = $seq->annotation();
> 
>    foreach $key ( $ac->get_all_annotation_keys() ) {
>        @values = $ac->get_Annotations($key);
>        foreach $value ( @values ) {
>           # value is an Bio::AnnotationI, and defines a "string" method
>           print "Annotation ",$key," stringified value
>              ",$value,"\n";
>        }
>    }
> 
> 
> (a) I always feel we have one too many class here - I sort of want to
> remove AnnotableI and make Seq inheriet from AnnotationCollectionI. But
> this is the way biojava does it (which may well be due to how we did it in
> the first place) relates to (c) below

I'd not make in inherit but has-a AnnotationCollectionI. Same for
SeqFeatureI. The reasons are that an implementor of SeqI or
SeqFeatureI can solve the AnnotationCollectionI methods simply by
returning null if for his/her particular case annotations are not
applicable. Second, AnnotationCollection objects can be re-used,
and also easily replaced. Making SeqFeatureI and SeqI inherit from
AnnotationCollectionI doesn't seem a lightweight solution.

As for the AnnotatableI/AnnotationCollectionI question, I'm not
sure we really need the presence of a method annotation()
indicated by ->isa('Bio::AnnotatableI') (that's what it boils down
to, isn't it). Unless we want to leave the annotatability of
SeqFeatureI and SeqI objects to the implementations. Otherwise we
simply add annotation() to the interfaces that should be
annotatable. Disadvantage is that you cannot distinguish easily
then whether an object doesn't have annotation or is not supposed
to have it.

Well, I'm not sure. :o

> 
> (b) We've got have some additional standard of "standard" keys, like
> 
>    reference, dblink, comment
> 
> etc to agree on. That's ok - that's what you live with for extensibility,
> but there is an argument that you might want something more heirarchical
> such that
> 
>     @objects = $ann->get_Annotation("geneticdisase")
> 
> would give you back Bio::Something::Disease::Genetic but
> 
>     @objects = $ann->get_Annotation("disease")
> 
> gives back the superset. Some heirarchical type system (centrally
> controlled?) controls the standard. (good? Bad?)
>

I have to admit that I do like an annotation design that supports
hierarchy explicitly: because annotation in many cases /is/
hierarchical (in meaning).

> After thinking about this I don't like it - it is asking for quite a heavy
> system behind the scenes (not so heavy, but heavy enough) to manage this
> and will make implementing other objects behind this interface
> tough. Hmph.
> 
> In general, if we do set a standard set of tags, to what extent should we
> enforce the tag-->object mapping. I'm leaning towards relatively strictly
> enforcing it with a hash in AnnotationCollectionI being something like
> 
> %tag_object_map = (
>         'reference' => 'Bio::Annotation::RefernceI',
>         'dblink'    => 'Bio::Annotation::DBLinkI',
>         'comment'   => 'Bio::Annotation::CommentI' );
> 
> with the idea that implementations enforce these rules of their annotation
> collections.

I agree with a rather strong typing. It helps avoid and/or find
programming errors, and provides for a clearer code (not
necessarily a simpler code though).

> 
> The problem here is that I want to keep backward compatibility with the
> current has_tag_value, each_tag_value system on SeqFeatureI reusing the
> AnnotationI ->string method to allow to put these in. This means I want
> 
>   SeqFeatureI to inheriet from AnnotationCollectionI
>

See above rg. is-a <-> has-a.
 
> 
> I am sorely tempted to try to build other, richer serialisation standards
> in here. This would be sort of like the to_FTHelper system for sequence
> features but perhaps something XML-like. Something like
> 
> # might not be good for large objects
> $xml_string = $annotation->to_XML
> 
> or
> 
> # painful for getting it back to a string, could use IO::String
> $stream = \*STDOUT;
> $annotation->write_XML($stream)
> 
> What do people think here? Useful? I suspect putting something like this
> is good.

Doesn't this call for something like a AnnotationIO with different
drivers like the SeqIO system?

> 
> Do we need a basis object (experimental/computational/reference) and if
> so, what should it look like?
> 

Can't these be generalized as 'references' kind of things? So,
annotation would have reference() which returns an object that
gives the reference for the annotation, i.e., is it a computation
result, is it from an article, or from an experiment. You would
then have to define a common interface for these ... :o

> Do people have opinions on
> this? Jason/Hilmar/Heikki/Matt/Thomas/Mark+David are the people I am most
> interested in hearing from. Key questions:
> 
>   (a) rigid biojava/biocorba cribbing, or removing this AnnotableI
> interface? (I favour removing)

I tend to favor it too. Retaining BioCorba compatibility is a
must, however.

> 
>   (b) type enforcement of standard types (I like enforcement - it will
> catch otherwise weird lookig bugs)

I like it too.

> 
>   (c) type heirarchy or flat (I favour flat)

I'd favor hierarchical, or at least a design that naturally
supports hierarchy.

> 
>   (d) XML serialisation and how to do it (I think it is a good thing, no
> clear ideas how to do it. Has someone done this before and had stories. I
> should bond with Lincoln about Boulder next week)
> 

As mentioned, an AnnotationIO::* system with different drivers
depending on what you want.

> 
> i have the nasty feeling one of the decisions we make we will regret. I
> wonder which one!

I'm afraid I agree. I hope my comments make at least some sense.

	-hilmar


-- 
-----------------------------------------------------------------
Hilmar Lapp                              email: hilmarl@yahoo.com
A-1120 Vienna
-----------------------------------------------------------------


_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com