[Bioperl-l] feature holder for testing overlaps, etc
Lincoln Stein
lstein@cshl.org
Mon, 20 May 2002 15:58:18 -0400
Hi Jason,
Would it be OK to overlay the DasI interface on top of features_in_range()
and get_features()? Then gbrowse will run on top of it.
What if I want to combine those two methods to return features of a
particular type that fall inside a particular range? This is a very common
optimization and will greatly help performance if implemented correctly. The
DasI overlapping_features() method works this way. There are also the
following methods:
contained_features() -- find features that are contained inside range
contained_in() -- find features that completely contain a range
The way to fetch a range with a B-Tree is to use the DB_File object-oriented
seq() method with a cursor of R_CURSOR. This has to be coupled with a custom
indexer that performs a numeric comparison, and the appropriate flags to
allow you to fetch duplicate keys. See the DB_file documentation for
examples of this.
Lincoln
On Wednesday 15 May 2002 18:45, Jason Stajich wrote:
> Here is the proposal for an in-memory SeqFeature collection interface
> and object tenatively called Bio::SeqFeature::FeatureCollectionI and
> Bio::SeqFeature::Collection - which is analagous to ChrisM's described
> IntersectionGraph (maybe it can inheriet from an InterfaceGraphI if
> you want to help abstract those methods out).
>
> SeqFeatureCollectionI interface
> methods:
> add_features -- add a set of features to the collection
>
> features_in_range -- returns a list of features that are contained in
> a specified start & end,range or LocationI.
> Optionally taking into account strand in the same
> way the Range overlap/contains methods do.
> Accept a flag as to whether to test for features
> that overlap or are completely contained.
> get_features(-tag => $tag) - returns a list features that have the
> requested tag (this will only be more efficient
> than grepping on the list if the # of features is
> large.
>
> It could be reasonable to let Bio::Seq objects use a
> SeqFeatureCollection to hold their features depending on the
> efficiency here - but one thing at a time.
>
> Bio::SeqFeature::Collection would be implemeted with a BDB B-Tree and
> use Lincoln's bin method from Bio::DB::GFF::Util::Binning. I'm not
> sure how to get things that fall within a range from the BDB B-Tree
> interface - have to pull from a sorted list somehow and most of the
> examples are for duplicate hash keys, hints appreciated.
>
> -jason