[Bioperl-l] WARNING INCOMING: collection consolidation
Paul Edlefsen
pedlefsen at systemsbiology.org
Wed Feb 26 19:18:15 EST 2003
Hilmar Lapp wrote:
> Just as an aside, a little more communication about what's going on in
> the freaky branch wouldn't hurt if this changes a lot of things (as
> opposed to adding things) and is ever to go into the main trunk ...
I agree, and I apologize if it seems mysterious.
Mostly it's the collection consolodation. I've been holding back from
checking in the bulk of my work on that because the same branch has been
used by Lincoln and others to test out some (basically unrelated) ideas
(relative locations and I think GFF3), and I don't want to check stuff
in until the tests pass for fear of breaking other people's ongoing work.
The unique identifier stuff is also unrelated and was a quick answer to
a short discussion that Lincoln and I had about the bulkiness of the
existing IdentifierI interface and my desire to have a lighter-weight
one that could unify the disparate concepts of 'unique identifier' that
I find confusing in BioPerl. It has so far remained sequestered on the
freak branch because we've all had so many other things to squabble about.
The collection consolodation has been briefly mentioned on the list,
mostly as a warning because it will affect users of feature collections,
including DasI, GFF, and the gbrowse stuff. The discussion brought up a
lot of important issues that are still unresolved, particularly about a)
handling relative ranges, b) the relationship between sequences and
their annotations, and c) naming conventions. I have had to trudge
through with these things up in the air, so I've made some working
decisions: a) I've added seq_id() to RangeI, but have documented that it
can remain undef and that's okay; I've also created a RelRangeI (and an
implementation, RelRange) that adds accessor methods for absolute start,
end, and strand values, utility methods for conversion between absolute
and relative range values, and an absolute() flag for forcing
absoluteness (this all came from the Bio::DB::GFF::RelSegment class); my
new interface Bio::SeqFeature::SegmentI isa RelRangeI and it is the only
thing besides RelRange that presently extends/implements RelRangeI. b)
I'm just using the SeqFeatureI stuff as-is because I don't yet
understand the proposed new model; I'm a bit wary about how that will
work with the new Bio::SeqFeature::CollectionI stuff but I'm excited for
the challenge. c) I'm sticking with (the name)
Bio::SeqFeature::CollectionI for now because I'm lazy and we can't seem
to decide if it should be Bio::SeqFeatureCollectionI instead; this is a
minor change downstream if necessary.
On the whole the plan is to make sure that things remain
backwards-compatible where possible. The collection consolodation
unites many existing classes that provide filtered access to feature
lists, including Bio::SeqFeature::CollectionI,
Bio::SeqFeature::Collection, Bio::Das, Bio::DasI, Bio::Das::Segment,
Bio::DB::GFF, Bio::DB::GFF::Segment. We've also made a new interface
for _providers_ of collections, to unify access to databases and DAS
servers and other things that store features. The need for this is that
gbrowse currently gets unified access to Das and GFF data sources via
the DasI interface, which is poorly named and poorly placed for a
generic data access interface. The result is three new interfaces in
Bio::DB, Bio::DB::FeatureProviderI, Bio::DB::SequenceProviderI, and
Bio::DB::SegmentProviderI, where the latter is a simple extension of the
two former interfaces. SequenceProviderI isa Bio::DB::RandomAccessI and
a Bio::DB::UpdateableSeqI. All three interfaces provide a minimal core
set of methods for adding, retrieving, updating, and deleting (features
or sequences) from a data store.
So far there's nothing (else) major here. Some existing things will be
deprecated, such as Bio::DB::GFF::RelSegment. Some existing things will
implement additional interfaces (eg. those many collections will now
implement the common Bio::SeqFeature::CollectionI interface).
I do not think that this email will suffice as a request for comment,
but comments are welcome. When it gets closer to real (like when I can
get the tests to succeed and can check it all in to the freaky branch) I
will get back to this list with a real proposal and can refer people to
its working implementation. I hope that the initial investment will pay
off. This is all groundwork for an overhaul of gbrowse's data access
methodology, with the goal of making gbrowse more component-based and
allowing for multiple simultaneous data sources of more disparate types.
Thanks for reading all the way through this long message. Please accept
my apology if it seems that we have failed to solicit sufficient input
from the group; your comments will be appreciated.
:Paul
More information about the Bioperl-l
mailing list