XML vs AnnotationCollectionI [was Re: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes]

Chris Mungall cjm at fruitfly.org
Tue Nov 23 22:24:21 EST 2004


You know, it strikes me that the whole AnnotationCollectionI framework is
really just a recreation of xml. I'm not sure what the advantages of AC
over xml are - but I can see plenty of advantages of xml over AC.

with xml under the hood, you can still implement the exact same OO methods
for accessing tags and values. But you potentially get more for free:
potentially faster lookup times and smaller memory footprints; various
validation choices - RNG/XML-Schema/DTDs (right now ACs are weakly typed
which is not good for some s/w engineering tasks); potentially more
interoperation between bio* projects; powerful querying and transformation
choices; using standards; auto-serialization

One idiom I have used in a project was to attach rich annotations to
features using the existing tag-value system, but only using a single tag
of type 'xml', and this served rather nicely.

What I'm talking about is something more radical in that the entire
tag-value hash would be replaced by an xml structure. of course,
convenient get_tag_values style accessors would remain in place, and would
query this structure.

I may play around with something like this on a clean branch if there's
anyone else who doesn't think this is a mad idea and may actually use the
final results....

The only hindrance I can see is that sometimes AC is used to hold perl
objects rather than recursive tag-value hashes; there would need to be a
way of auto-reconstituting these - possibly from IDs....

Cheers
Chris

On Tue, 23 Nov 2004, Allen Day wrote:

> Fixed.  Here is a summary of what I did to make this happen.  I went ahead
> and did the work necessary to make Bio::SeqFeatureI AnnotatableI instead
> of being itself an AnnotationCollectionI.
>
> . Bio::SeqFeatureI inherits Bio::AnnotatableI NOT
>   Bio::AnnotationCollectionI
> . *_tag_* methods are in Bio::AnnotatableI, and internally defer to
>   Bio::AnnotatableI->annotation->some_analagous_mapped_function()
>   . method behavior is now more similar to original *_tag_* method
>     behavior ; tag "values" are now instantiated as
>     Bio::Annotation::SimpleValue objects by default, unless their name
>     indicates they should be otherwise (e.g. tag name "comment" or
>     "dblink")
> . deprecation warnings commented until 1.6
> . Bio::AnnotatableI now keeps a tag->annotation_type registry to allow
>   new tags to be created (see Bio::SeqFeature::AnnotationAdaptor).
>   . Bio::SeqFeature::AnnotationAdaptor is now not very useful, as *_tag_*
>     methods map directly onto Bio::AnnotationI's
>     Bio::AnnotationCollectionI instance.
> . Unflattener and Unflattener2 tests pass with no changes.
> . All tests pass.
>
> -Allen
>
>
> On Tue, 23 Nov 2004, Chris Mungall wrote:
>
> >
> > Unflattener.t is failing because someone has messed up get_tagset_values()
> > - this is a convenience method I originally added to SeqFeatureI. I'm not
> > familiar enough with the new changes and AnnotationCollections to fix
> > this.
> >
> > Surely the onus has always been on the person making changes to make sure
> > the test suite passes before committing their changes? In which case, how
> > did these changes make it in in the first place?
> >
> > On Tue, 23 Nov 2004, Jason Stajich wrote:
> >
> > >
> > > On Nov 23, 2004, at 4:47 PM, Allen Day wrote:
> > >
> > > > On Tue, 23 Nov 2004, Jason Stajich wrote:
> > > >
> > > >> I think if we just don't issue deprecation warnings it will be fine by
> > > >> me -- even if we are just calling the new subroutine under the hood.
> > > >> Tests seem to pass although Unflattner.t is falling over today not
> > > >> sure
> > > >> what is problem.
> > > >
> > > > that fails for me too, in addition to spewing out lots of
> > > > diagnotistics.
> > > > however, if you run 'make test_Unflattener2', it passes.  strange.
> > > >
> > > no it is Unflattner not Unflattner2
> > >
> > > % make test_Unflattener
> > > [SNIP OUT SOME STUFF]
> > >
> > > -------------------- WARNING ---------------------
> > > MSG: get_tagset_values() is deprecated.  use get_Annotations()
> > > ---------------------------------------------------
> > >
> > > ------------- EXCEPTION: Bio::Root::Exception -------------
> > > MSG: Abstract method "Bio::AnnotationCollectionI::get_Annotations" is
> > > not implemented by package Bio::SeqFeature::Generic.
> > >
> > >
> > > > -allen
> > > >
> > > >>
> > > >> -jason
> > > >> On Nov 23, 2004, at 2:28 PM, Aaron J. Mackey wrote:
> > > >>
> > > >>>
> > > >>>> On Friday, November 19, 2004, at 02:50  PM, Allen Day wrote:
> > > >>>>
> > > >>>>> * Bio::SeqFeatureI now ISA Bio::AnnotationCollectionI
> > > >>>>> * All Bio::SeqFeatureI *_tag_* methods have been moved to
> > > >>>>>   Bio::AnnotationCollectionI, marked as deprecated, and mapped to
> > > >>>>> their
> > > >>>>>   analogous and mostly pre-existing Bio::AnnotationCollectionI
> > > >>>>> methods.
> > > >>>>>
> > > >>>>>   Methods which were not in Bio::AnnotationCollectionI, but were i
> > > >>>>>   Bio::Annotation::Collection and were necessary for *_tag_* method
> > > >>>>>   remapping were created in Bio::AnnotationCollecitonI.
> > > >>>
> > > >>> I've been paying some attention to this, but thought that the changes
> > > >>> were only those required to get Bio::FeatureIO working (i.e.
> > > >>> recapitulate GFF3 logic) without hampering object usage; do our tests
> > > >>> pass with these changes in place?
> > > >>>
> > > >>> On Nov 23, 2004, at 2:12 PM, Jason Stajich wrote:
> > > >>>
> > > >>>> it has not been tagged yet.  I think Aaron is just really busy on
> > > >>>> this front.
> > > >>>
> > > >>> I did tag the HEAD at RC1, so we could branch from there if we needed
> > > >>> to; if this is really the big bug-bear that Hilmar and Jason are
> > > >>> claiming, then I'd ask Allen to retract his patches that alter
> > > >>> interface definitions, and branch.
> > > >>>
> > > >>> And I was so hoping to get RC2 packaged up later today ...
> > > >>>
> > > >>> -Aaron
> > > >>>
> > > >>> --
> > > >>> Aaron J. Mackey, Ph.D.
> > > >>> Dept. of Biology, Goddard 212
> > > >>> University of Pennsylvania       email:  amackey at pcbi.upenn.edu
> > > >>> 415 S. University Avenue         office: 215-898-1205
> > > >>> Philadelphia, PA  19104-6017     fax:    215-746-6697
> > > >>>
> > > >>>
> > > >> --
> > > >> Jason Stajich
> > > >> jason.stajich at duke.edu
> > > >> http://www.duke.edu/~jes12/
> > > >>
> > > >> _______________________________________________
> > > >> Bioperl-l mailing list
> > > >> Bioperl-l at portal.open-bio.org
> > > >> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > > >>
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at portal.open-bio.org
> > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > > >
> > > >
> > > --
> > > Jason Stajich
> > > jason.stajich at duke.edu
> > > http://www.duke.edu/~jes12/
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at portal.open-bio.org
> > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > >
> >
>


More information about the Bioperl-l mailing list