[Bioperl-l] WARNING INCOMING: collection consolidation
Ewan Birney
birney at ebi.ac.uk
Thu Feb 27 12:07:16 EST 2003
> The unique identifier stuff is also unrelated and was a quick answer to
> a short discussion that Lincoln and I had about the bulkiness of the
> existing IdentifierI interface and my desire to have a lighter-weight
> one that could unify the disparate concepts of 'unique identifier' that
> I find confusing in BioPerl. It has so far remained sequestered on the
> freak branch because we've all had so many other things to squabble about.
>
I quite like the Identifier stuff, but I suspect if we do it I think we
should try to do it as much as possible across the entire set.
Am I right in thinking that one of your classes is:
Uniquely-Identifiable-Object-For-This-Implementation-but-not-exportable-ids
and the other one is
Uniquely-Identifiable-Object-For-Planet-Bioinformatics-and-so-exportable/queryable-ids
I certainly find these two concepts separable and useful to distinguish
which was why back an ancient history on Bio::PrimarySeqI I had
primary_id - the non world visible one, and a really stupid name
accession_number - the world visible one
If I am right, what are your object names? If I am wrong... can you
enlighten me...?
> The collection consolodation has been briefly mentioned on the list,
> mostly as a warning because it will affect users of feature collections,
> including DasI, GFF, and the gbrowse stuff. The discussion brought up a
> lot of important issues that are still unresolved, particularly about a)
> handling relative ranges, b) the relationship between sequences and
> their annotations, and c) naming conventions. I have had to trudge
> through with these things up in the air, so I've made some working
> decisions: a) I've added seq_id() to RangeI, but have documented that it
> can remain undef and that's okay; I've also created a RelRangeI (and an
> implementation, RelRange) that adds accessor methods for absolute start,
> end, and strand values, utility methods for conversion between absolute
> and relative range values, and an absolute() flag for forcing
> absoluteness (this all came from the Bio::DB::GFF::RelSegment class); my
> new interface Bio::SeqFeature::SegmentI isa RelRangeI and it is the only
> thing besides RelRange that presently extends/implements RelRangeI. b)
> I'm just using the SeqFeatureI stuff as-is because I don't yet
> understand the proposed new model; I'm a bit wary about how that will
> work with the new Bio::SeqFeature::CollectionI stuff but I'm excited for
> the challenge. c) I'm sticking with (the name)
> Bio::SeqFeature::CollectionI for now because I'm lazy and we can't seem
> to decide if it should be Bio::SeqFeatureCollectionI instead; this is a
> minor change downstream if necessary.
>
> On the whole the plan is to make sure that things remain
> backwards-compatible where possible. The collection consolodation
> unites many existing classes that provide filtered access to feature
> lists, including Bio::SeqFeature::CollectionI,
> Bio::SeqFeature::Collection, Bio::Das, Bio::DasI, Bio::Das::Segment,
> Bio::DB::GFF, Bio::DB::GFF::Segment. We've also made a new interface
> for _providers_ of collections, to unify access to databases and DAS
> servers and other things that store features. The need for this is that
> gbrowse currently gets unified access to Das and GFF data sources via
> the DasI interface, which is poorly named and poorly placed for a
> generic data access interface. The result is three new interfaces in
> Bio::DB, Bio::DB::FeatureProviderI, Bio::DB::SequenceProviderI, and
> Bio::DB::SegmentProviderI, where the latter is a simple extension of the
> two former interfaces. SequenceProviderI isa Bio::DB::RandomAccessI and
> a Bio::DB::UpdateableSeqI. All three interfaces provide a minimal core
> set of methods for adding, retrieving, updating, and deleting (features
> or sequences) from a data store.
>
This jives well for me. At singapore I proposed a reordering of the
classes to deal with the "multiple coordinate system" (one feature being
on - say - 3 coordinate systems, being genomic, contig and cDNA) whilst
neatly maintaining backward compatibility of SeqFeatures and - very
attractively in my view - unifying the objects to store annotation about a
feature with the objects to store annotation about a sequence.
Did my proposal make sense? I think your Bio::DB::FeatureProviderI is very
close to my proposed Bio::Seq::CoordinateManagerI and/or
Bio::Seq::FeatureCollectionI.
Aaron is planning to do some commentary about this. Realistically we do
need to all get into the same room. Don't suppose you can fly
Seattle-->NY in the next couple of days?
> So far there's nothing (else) major here. Some existing things will be
> deprecated, such as Bio::DB::GFF::RelSegment. Some existing things will
> implement additional interfaces (eg. those many collections will now
> implement the common Bio::SeqFeature::CollectionI interface).
>
> I do not think that this email will suffice as a request for comment,
> but comments are welcome. When it gets closer to real (like when I can
> get the tests to succeed and can check it all in to the freaky branch) I
> will get back to this list with a real proposal and can refer people to
> its working implementation. I hope that the initial investment will pay
> off. This is all groundwork for an overhaul of gbrowse's data access
> methodology, with the goal of making gbrowse more component-based and
> allowing for multiple simultaneous data sources of more disparate types.
>
> Thanks for reading all the way through this long message. Please accept
> my apology if it seems that we have failed to solicit sufficient input
> from the group; your comments will be appreciated.
More communication.... good. We probably need a 3rd party (Aaron) to
produce the final insights....
>
> :Paul
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>
More information about the Bioperl-l
mailing list