[Bioperl-l] Re: LocationI
Hilmar Lapp
hlapp@gmx.net
Thu, 18 Jan 2001 11:11:57 -0800
Jason Stajich wrote:
>
> Interfaces:
>
> Bio::LocationI -> ISA RangeI
> Purpose: capture location information - such as in an EMBL/GenBank
> feature
> /source 1..345
> Methods: RangeI methods, and ...? [start/end/strand]
>
> Questions: How is a LocationI object going to be different from the
> vanilla SeqFeatureI or should be migrate some methods from
> SeqFeature (start/end/strand) to LocationI and make
> SeqFeaturesI more about tags (primary/source/has_tag/each_tag)
> and gff stuff?
In principle I think yes. SeqFeatureI could still keep
start/end/strand and map these to calls into the location object.
Or, SeqFeatureI loses it (i.e., it's no longer mandatory), but for
simplicity SeqFeature::Generic keeps it.
>
> Bio::ComplexLocationI -> ISA Bio::LocationI
> Purpose: capture location information for features that are not linear
> as in an EMBL/Genbank join
> CDS join(544..589,688..1032)
>
> Methods:
> - sub_Locations() -> a list of LocationI objects that indicate
> start/stop boundaries for this object must override overlap,
> contains, etc from RangeI with since coordinates are not
> contiguous
>
> Objects:
> Bio::SeqFeature::Generic -> ISA Bio::SeqFeatureI, Bio::LocationI
> add the location() method to this object, the LocationI object
> returned will be a reference to $self.
>
> Bio::SeqFeature::Complex -> ISA Bio::SeqFeatureI, Bio::ComplexLocationI
> Purpose: implementation to handle those join() statements
This is the outline you pretty much follow in the proposal on
Wiki. The point I'm not so happy with is that purely
location-specific issues change the class (type) of a SeqFeature.
>
> I'm still not clear on what a fuzzy location is supposed to represent
> ie - does that mean we know that the feature is located somewhere
> in the range, but we don't know the exact start/stop?
Exactly. At least to my understanding.
> Why can't you treat
> it like real start/stop since we don't have any more information? Or
> would union/intersection calculations need to behave differently?
>
Well, biologically you can't, because annotating a sequence with
such a feature without indicating the uncertainty of start and end
is deceptive. For cDNA entries this is sometimes crucial: <1..100
as CDS location means that the entry doesn't even contain the
start of the CDS, and it's totally unclear where that is.
Hilmar
--
-----------------------------------------------------------------
Hilmar Lapp email: hlapp@gmx.net
GNF, San Diego, Ca. 92122 phone: +1 858 812 1757
-----------------------------------------------------------------