[Bioperl-l] feature holder for testing overlaps, etc
Jason Stajich
jason@cgt.mc.duke.edu
Wed, 15 May 2002 18:45:11 -0400 (EDT)
Here is the proposal for an in-memory SeqFeature collection interface
and object tenatively called Bio::SeqFeature::FeatureCollectionI and
Bio::SeqFeature::Collection - which is analagous to ChrisM's described
IntersectionGraph (maybe it can inheriet from an InterfaceGraphI if
you want to help abstract those methods out).
SeqFeatureCollectionI interface
methods:
add_features -- add a set of features to the collection
features_in_range -- returns a list of features that are contained in
a specified start & end,range or LocationI.
Optionally taking into account strand in the same
way the Range overlap/contains methods do.
Accept a flag as to whether to test for features
that overlap or are completely contained.
get_features(-tag => $tag) - returns a list features that have the
requested tag (this will only be more efficient
than grepping on the list if the # of features is
large.
It could be reasonable to let Bio::Seq objects use a
SeqFeatureCollection to hold their features depending on the
efficiency here - but one thing at a time.
Bio::SeqFeature::Collection would be implemeted with a BDB B-Tree and
use Lincoln's bin method from Bio::DB::GFF::Util::Binning. I'm not
sure how to get things that fall within a range from the BDB B-Tree
interface - have to pull from a sorted list somehow and most of the
examples are for duplicate hash keys, hints appreciated.
-jason
--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu