[Bioperl-l] Bringing Bioperl annotations in-line.

Ewan Birney birney@ebi.ac.uk
Fri, 6 Apr 2001 09:34:31 +0100 (BST)


It looks like we have a real, wholescale annotation rewrite in the
off'ing. I think this is a Good Thing, as this is definitely an area
where we've just thrown together objects that "work for one format". 

As always, Bioperl rules apply - ie, whoever codes it wins the argument
(we have never had two people competitively coding to win an
argument. This could be the first).


Ok. Let's start from a reasonably blank sheet and write down some
requirements for this baby:


Overall Goals
-------------

  - Should try to remain backwardly compatible as possible (possibly using
Perl Magic to fake it)

  - Should be available for reuse on a number of different things,
minimally --

       Sequence objects themselves
       Sequence features
         "rich" sequence features, ie, genes
       Other-biological-objects (eg Alignments)


Now - into some use-cases:
-------------------------

Sequence objects -
      
      Have References, Comments, Database Links (DR lines)

      (currently this is modelled inside Bioperl)


Sequence features - 

      Have qualifiers (currently modelled as has_tag_value,
each_tag_value). Should be a restricted vocab

      Have DR lines (dbxref comments). Not automatically promoted to
DBLinks (Bad bioperl!)



Rich Sequence Features (eg Genes) 

      Currently just qualifier stuff we have on Features

      We would like everything we can put on sequences (References,
Comments, Database Links) and also real controlled-vocab style things (GO)




Other Biological Objects


     I think the assumption is that if we can do everything we want to do
with Genes but there is nothing "gene specific" about the code, this
should work.


Other Issues
------------

Audit-Trail ideas

     Annotations should always have an audit trail for the reason why this
annotation was associated, eg,

   BY_SIMILARITY_INFERENCE(thing-infered-from),
   BY_EXPERIMENT(paper-reference), 
   BY_CURATOR(curator-name)

GO and Swissprot share a common idea about evidence tagging. We should
probably just slip-stream their work here.

This is definitely a "Good Thing".


ControlledVocab abstraction

   GO is nothing more clever than controlled Vocab. Quite alot of
annotation looks like it is going to move that way. Perhaps we should
abstract this out into a common set of classes.



Implementation Issues:
---------------------

     Split Interfaces from Implementation


     Neither SeqFeature nor the new "Annotation" object should become some
massive "blob" of just methods people want to have. This means that the
Interfaces should be flexible enough that if you "just want to add a
seqfeature" you do not need to write methods that support yadda-yadda
annotation retrieval.



    Needs to play well with EMBL/GenBank, Swissprot and Feature Table
dumping in particular. Also play well with GFF. Playing with GAME would be
nice, but GAME is still only mission-critical for the BDGP guys.






Ok. So --- what I would like to suggest is that Mark, Hilmar, Myself,
Chris and anyone else who wants to play come up with proposals (or modify
proposals of each other) of real code, such as


   Bio::AnnotationI
     defines ->get_ReallyInterestingStuff()



and, of course, once we have kicked those ideas around for a while,
whoever does the final coding, gets to make the final decisions



Sounds fair?





-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------