[Bioperl-l] Feature/Location

Jason Stajich jason@chg.mc.duke.edu
Tue, 23 Jan 2001 13:32:21 -0500 (EST)


On Tue, 23 Jan 2001, David Block wrote:

> On Tue, 23 Jan 2001, Ewan Birney wrote:
> 
> > On Tue, 23 Jan 2001, David Block wrote:
> > 
> > > > 
> > > > I updated the wiki - please feel free to make corrections, clarifications,
> > > > or to elaborated the interfaces.  SplitLocationI will have a method 
> > > > sub_Locations which returns the list of LocationI objects that represent
> > > > the sub locations of the, well, location.  In code terms -
> > > > 
> > > > # get a $geneobj somehow
> > > > my $location = $geneobj->location;
> > > > if( $location->isa('Bio::Location::SplitLocationI') ) {	
> > > > 	foreach my $exon ( $location->sub_locations() ){ 
> > > > 		print "exon at ", $exon->start, "..", $exon->end, "\n";
> > > > 	}
> > > > }
> > > > 
> > > > One problem with this approach - what if I want to actually have the real
> > > > Exon object....  Must I instead iterate through what is returned 
> > > > by sub_Features?  Does the SeqFeature::GeneStructureI instead handle all
> > > > of this and I should instead call $geneobj->exons() not touching the
> > > > Location objects (makes most sense to me).
> > > > 
> > > > -jason
> > > > 
> 
> Okay, for clarity, this only is relevant when there is a SplitLocationI
> situation, correct?  So the implementation of SplitLocationI was going to
> be an array of simple LocationI's?  If not, then what I'm talking about is
> irrelevant.
No you're right, I imagine it will be a list of LocationI objects at some 
point.  sub_Locations will be a SplitLocationI method.

> 
> > > 
> > > That would be good.  Then you could call that exon's location method to
> > > get the location object of the exon.  So you have two routes to the
> > > start/end pair.  That sounds good to me.
> > 
> > <<Ewan Winces>>
> > 
> > I think we are giving ourselves *alot of rope* to hang ourselves here and
> > we will end up with different conventions about how to descend these
> > objects...

I agree, I think this is why you and I were leaning towards collapsing
Location into SeqFeature, but I also agree with many of arguments for
splitting the 2.
>
> 
> Different conventions for different situations?  What I was talking about
> was the two different situations:
> 1) gene drawing, I want to know all the locations that are 'gene' so I can
> draw them somehow -> sub_locations gives me a list of simple locations
> that I can iterate through.  I don't care about the nature of the exons I
> am drawing, just that they belong to a gene.
> 
> 2) exon interrogation, I want to examine each exon individually.  Now I
> want the gene/transcript's exons method to give me each exon.  Each of
> those also has a location.  The exon's location method links to the
> location object that is linked to by the sub_location call, so there is no
> duplication of data.

So we use the sub_SeqFeature method on a SeqFeatureI to get the list of
sub-features for a feature (since exons should be sub features of gene).
In specialized objects like Gene we could call exons() to get these
objects.

> And if any of these exon locations are fuzzy or split, etc., the location
> object gives us that.
>  

Without getting lost in example land - here is one question of how to
instantiate these things:

Imagine the case of parsing a GenBank/EMBL file with annotated genes on a
genomic sequence via the bioperl SeqIO system.  We get to a SplitLocation.  
How should we represent the object?  If primary_tag == 'CDS' do we
instantiate a GeneStructure object?  Otherwise we will instantiate all
features with SeqFeature::Generic, some will have LocationI locations,
some will have SplitLocationI locations.  Assuming we sufficiently capture
all of the information encoded about the Feature a user could write code
to transform collections of CDS, source, exon, etc.. primary tags
retrieved from a GenBank/EMBL parse into a GeneStructure object.  

I am going to guess that at some point we'd like to write an object that
handles this gene instantiation in the common case or at least gives good
examples on how to do it.
 

Jason Stajich
jason@chg.mc.duke.edu
Center for Human Genetics
Duke University Medical Center 
http://www.chg.duke.edu/