tags and annotations, was Re: [Bioperl-l] Annotated.pm
Steffen Grossmann
grossman at molgen.mpg.de
Mon Nov 29 04:12:31 EST 2004
I see this problem now. I wrote it like this to make sure that the
$feat->source always gives back something, so that my scripts don't die,
when I call $feat->source->value. Of course, getting back too many
objects is also a problem and should be fixed...
But!
I also am more concerned with the fact that it is not yet clear how tags
and annotations should be handled in the future. Of course, typed
annotations are a good thing, especially when it comes to more complex
objects like 'OntologyTerms' or 'DBLinks'. But, frankly, I sometimes
think that it is an overkill for 'simple values': For example in the
Bio::SeqFeature::Annotated::score method all the checking for
appropriateness of the value is not provided by using
Bio::Annotation::SimpleValue. We could, of course, start to implement
all kinds of other Bio::AnnotationI classes like
'Bio::Annotation::SimpleScore' or 'Bio::AnnotationSimpleString', but do
we really want this?
I, personally, would a tag/annotation scheme like to fulfil the
following (I am guided by GFF3):
1) An important thing, but which is missing up to now, is a mechanism
which makes sure that there is only _one single_ entry under a certain
tag. This is, e.g., important when setting the ID of a feature in the
sense of GFF3.
2) In some cases we want to have several values stored under a tag, but
we want them to be unique. This is, e.g., the case when giving the
parents of a feature in the sense of GFF3. (By the way, getting/setting
the parents should be connected to the get_SeqFeature and
remove_SeqFeature methods directly)
3) Sometimes typing is important, e.g., when (again GFF3)
setting/getting 'Dbxref' attributes and, of course, also when talking
about the 'type' of a feature, which has to come from Sequence Ontology
in GFF3.
I think that we should distinguish between 'standard' annotation and
'custom' annotation. On the use and typing of 'standard' annotations we
should agree community-wide, whereas for 'custom' annotation this is
left to the user (although bioperl should help in dealing with it).
Methods like 'seqid', 'source', 'type', etc. etc. are all concerned with
standard notation (although I bet we don't have a well defined way of
thinking about them...), but currently they are implemented using
mechanism which have been written to deal with custom annotation! This,
I think, is the main problem. When we want annotation which is
completely compatible with, e.g., GFF3, we want to be very strict (since
GFF3 is quite well-defined) and we don't want it to interfere with other
kinds of annotation.
This we haven't solved yet (although, when writing this email I start to
get ideas about how to do it...).
Steffen
Hilmar Lapp wrote:
> I should add that even Annotated::source() is documented as returning
> a single object:
>
> Returns : a Bio::Annotation::SimpleValue object representing the source.
>
> In this it is easier to detect the problem because a number is not a
> reference, so the first attempt to de-reference it will cause a script
> to die. (Although I wouldn't want to be the person having to debug the
> original cause ...)
>
> See my point?
>
> -hilmar
>
> On Saturday, November 27, 2004, at 12:32 AM, Hilmar Lapp wrote:
>
>>
>> But this is not what I meant. As an example, $annotated->source_tag()
>> will delegate to source():
>>
>> sub source_tag {
>> return $shift->source(@_);
>> }
>>
>> Annotated::source returns what the get_Annotations() short-cut returns:
>>
>> return $self->get_Annotations('source');
>>
>> If somebody accidentally added another annotation with tag 'source',
>> not knowing that it is being used internally, the next call to
>> $annotated->source_tag() will return a number, not the source tag,
>> and not an array of source tags.
>>
>> This is what I mean by brittle. I mean that it is easy to hang
>> yourself as a user and you don't even get a warning before you die.
>> In this case it will even be a very slow death since a number is
>> still a scalar, and in order to realize the problem you need to
>> actually see that it is a number and not a meaningful string.
>>
--
%---------------------------------------------%
% Steffen Grossmann %
% %
% Max Planck Institute for Molecular Genetics %
% Computational Molecular Biology %
%---------------------------------------------%
% Ihnestrasse 73 %
% 14195 Berlin %
% Germany %
%---------------------------------------------%
% Tel: (++49 +30) 8413-1167 %
% Fax: (++49 +30) 8413-1152 %
%---------------------------------------------%
More information about the Bioperl-l
mailing list