[Bioperl-l] feature holder for testing overlaps, etc

Lincoln Stein lstein@cshl.org
Mon, 20 May 2002 15:58:18 -0400


Hi Jason,

Would it be OK to overlay the DasI interface on top of features_in_range() 
and get_features()?  Then gbrowse will run on top of it.

What if I want to combine those two methods to return features of a 
particular type that fall inside a particular range?  This is a very common 
optimization and will greatly help performance if implemented correctly.  The 
DasI overlapping_features() method works this way.  There are also the 
following methods:

	contained_features()  -- find features that are contained inside range
	contained_in()            -- find features that completely contain a range

The way to fetch a range with a B-Tree is to use the DB_File object-oriented 
seq() method with a cursor of R_CURSOR.  This has to be coupled with a custom 
indexer that performs a numeric comparison, and the appropriate flags to 
allow you to fetch duplicate keys.  See the DB_file documentation for 
examples of this.  

Lincoln


On Wednesday 15 May 2002 18:45, Jason Stajich wrote:
> Here is the proposal for an in-memory SeqFeature collection interface
> and object tenatively called Bio::SeqFeature::FeatureCollectionI and
> Bio::SeqFeature::Collection - which is analagous to ChrisM's described
> IntersectionGraph (maybe it can inheriet from an InterfaceGraphI if
> you want to help abstract those methods out).
>
> SeqFeatureCollectionI interface
> methods:
> add_features    -- add a set of features to the collection
>
> features_in_range -- returns a list of features that are contained in
> 		     a specified start & end,range or LocationI.
> 		     Optionally taking into account strand in the same
> 		     way the Range overlap/contains methods do.
> 		     Accept a flag as to whether to test for features
> 		     that overlap or are completely contained.
> get_features(-tag => $tag) - returns a list features that have the
> 		     requested tag (this will only be more efficient
> 		     than grepping on the list if the # of features is
> 		     large.
>
> It could be reasonable to let Bio::Seq objects use a
> SeqFeatureCollection to hold their features depending on the
> efficiency here - but one thing at a time.
>
> Bio::SeqFeature::Collection would be implemeted with a BDB B-Tree and
> use Lincoln's bin method from Bio::DB::GFF::Util::Binning.  I'm not
> sure how to get things that fall within a range from the BDB B-Tree
> interface - have to pull from a sorted list somehow and most of the
> examples are for duplicate hash keys, hints appreciated.
>
> -jason