tags and annotations, was Re: [Bioperl-l] Annotated.pm

Steffen Grossmann grossman at molgen.mpg.de
Mon Nov 29 04:12:31 EST 2004


I see this problem now. I wrote it like this to make sure that the 
$feat->source always gives back something, so that my scripts don't die, 
when I call $feat->source->value. Of course, getting back too many 
objects is also a problem and should be fixed...

But!
I also am more concerned with the fact that it is not yet clear how tags 
and annotations should be handled in the future. Of course, typed 
annotations are a good thing, especially when it comes to more complex 
objects like 'OntologyTerms' or 'DBLinks'. But, frankly, I sometimes 
think that it is an overkill for 'simple values': For example in the 
Bio::SeqFeature::Annotated::score method all the checking for 
appropriateness of the value is not provided by using 
Bio::Annotation::SimpleValue. We could, of course, start to implement 
all kinds of other Bio::AnnotationI classes like 
'Bio::Annotation::SimpleScore' or 'Bio::AnnotationSimpleString', but do 
we really want this?

I, personally, would a tag/annotation scheme like to fulfil the 
following (I am guided by GFF3):

1) An important thing, but which is missing up to now, is a mechanism 
which makes sure that there is only _one single_ entry under a certain 
tag. This is, e.g., important when setting the ID of a feature in the 
sense of GFF3.

2) In some cases we want to have several values stored under a tag, but 
we want them to be unique. This is, e.g., the case when giving the 
parents of a feature in the sense of GFF3. (By the way, getting/setting 
the parents should be connected to the get_SeqFeature and 
remove_SeqFeature methods directly)

3) Sometimes typing is important, e.g., when (again GFF3) 
setting/getting 'Dbxref' attributes and, of course, also when talking 
about the 'type' of a feature, which has to come from Sequence Ontology 
in GFF3.

I think that we should distinguish between 'standard' annotation and 
'custom' annotation. On the use and typing of 'standard' annotations we 
should agree community-wide, whereas for 'custom' annotation this is 
left to the user (although bioperl should help in dealing with it).

Methods like 'seqid', 'source', 'type', etc. etc. are all concerned with 
standard notation (although I bet we don't have a well defined way of 
thinking about them...), but currently they are implemented using 
mechanism which have been written to deal with custom annotation! This, 
I think, is the main problem. When we want annotation which is 
completely compatible with, e.g., GFF3, we want to be very strict (since 
GFF3 is quite well-defined) and we don't want it to interfere with other 
kinds of annotation.

This we haven't solved yet (although, when writing this email I start to 
get ideas about how to do it...).

Steffen


Hilmar Lapp wrote:

> I should add that even Annotated::source() is documented as returning 
> a single object:
>
> Returns : a Bio::Annotation::SimpleValue object representing the source.
>
> In this it is easier to detect the problem because a number is not a 
> reference, so the first attempt to de-reference it will cause a script 
> to die. (Although I wouldn't want to be the person having to debug the 
> original cause ...)
>
> See my point?
>
> -hilmar
>
> On Saturday, November 27, 2004, at 12:32 AM, Hilmar Lapp wrote:
>
>>
>> But this is not what I meant. As an example, $annotated->source_tag() 
>> will delegate to source():
>>
>> sub source_tag {
>> return $shift->source(@_);
>> }
>>
>> Annotated::source returns what the get_Annotations() short-cut returns:
>>
>> return $self->get_Annotations('source');
>>
>> If somebody accidentally added another annotation with tag 'source', 
>> not knowing that it is being used internally, the next call to 
>> $annotated->source_tag() will return a number, not the source tag, 
>> and not an array of source tags.
>>
>> This is what I mean by brittle. I mean that it is easy to hang 
>> yourself as a user and you don't even get a warning before you die. 
>> In this case it will even be a very slow death since a number is 
>> still a scalar, and in order to realize the problem you need to 
>> actually see that it is a number and not a meaningful string.
>>

-- 
%---------------------------------------------%
%            Steffen Grossmann                %
%                                             %
% Max Planck Institute for Molecular Genetics %
%      Computational Molecular Biology        %
%---------------------------------------------%
%              Ihnestrasse 73                 %
%               14195 Berlin                  %
%                 Germany                     %
%---------------------------------------------%
%         Tel: (++49 +30) 8413-1167           %
%         Fax: (++49 +30) 8413-1152           %
%---------------------------------------------%




More information about the Bioperl-l mailing list