[Bioperl-l] feature holder for testing overlaps, etc
Lincoln Stein
lstein@cshl.org
Mon, 20 May 2002 17:19:15 -0400
I like Bio::SeqFeature::Collection just fine. Unfortunately the equivalent
in the Bio::Graphics module is named Bio::Graphics::FeatureFile, because it
started out life as a parser for files containing a list of features, and
then morphed into a generalized collection of features. Probably time to
change the name.
Lincoln
On Monday 20 May 2002 16:13, Jason Stajich wrote:
> On Mon, 20 May 2002, Lincoln Stein wrote:
> > Hi Jason,
> >
> > Would it be OK to overlay the DasI interface on top of
> > features_in_range() and get_features()? Then gbrowse will run on top of
> > it.
>
> That sounds like a great idea. I'll look at the interface and see what it
> would take to implement it. Is Bio::SeqFeature::Collection an okay name
> in everyone's mind?
>
> > What if I want to combine those two methods to return features of a
> > particular type that fall inside a particular range? This is a very
> > common optimization and will greatly help performance if implemented
> > correctly. The DasI overlapping_features() method works this way. There
> > are also the following methods:
>
> I was thinking about something like this just this morning -
> perhaps gbrowse could allow a set of features (and their
> associated sequences) to be selected based on a feature range and/or some
> feature metadata like:
>
> $f->has_tag('gene') && grep { /$GENE/ } $f->each_tag_value('gene')
>
> Let's tackle this after I get the range query working.
>
> > contained_features() -- find features that are contained inside range
> > contained_in() -- find features that completely contain a
> > range
> >
> > The way to fetch a range with a B-Tree is to use the DB_File
> > object-oriented seq() method with a cursor of R_CURSOR. This has to be
> > coupled with a custom indexer that performs a numeric comparison, and the
> > appropriate flags to allow you to fetch duplicate keys. See the DB_file
> > documentation for examples of this.
>
> Thanks Lincoln.
>
> I almost have it working. I just figured out you probably can't mix
> get_dup calls within the calls to the cursor iterator or else you'll only
> get keys which have >1 value. I'll commit the code and tests tonight if
> it all works and we can expand from there.
>
> -jason
>
> > Lincoln
> >
> > On Wednesday 15 May 2002 18:45, Jason Stajich wrote:
> > > Here is the proposal for an in-memory SeqFeature collection interface
> > > and object tenatively called Bio::SeqFeature::FeatureCollectionI and
> > > Bio::SeqFeature::Collection - which is analagous to ChrisM's described
> > > IntersectionGraph (maybe it can inheriet from an InterfaceGraphI if
> > > you want to help abstract those methods out).
> > >
> > > SeqFeatureCollectionI interface
> > > methods:
> > > add_features -- add a set of features to the collection
> > >
> > > features_in_range -- returns a list of features that are contained in
> > > a specified start & end,range or LocationI.
> > > Optionally taking into account strand in the same
> > > way the Range overlap/contains methods do.
> > > Accept a flag as to whether to test for features
> > > that overlap or are completely contained.
> > > get_features(-tag => $tag) - returns a list features that have the
> > > requested tag (this will only be more efficient
> > > than grepping on the list if the # of features is
> > > large.
> > >
> > > It could be reasonable to let Bio::Seq objects use a
> > > SeqFeatureCollection to hold their features depending on the
> > > efficiency here - but one thing at a time.
> > >
> > > Bio::SeqFeature::Collection would be implemeted with a BDB B-Tree and
> > > use Lincoln's bin method from Bio::DB::GFF::Util::Binning. I'm not
> > > sure how to get things that fall within a range from the BDB B-Tree
> > > interface - have to pull from a sorted list somehow and most of the
> > > examples are for duplicate hash keys, hints appreciated.
> > >
> > > -jason