[Bioperl-l] HOWTO: take a slice of a split location

Mon Dec 12 10:45:06 EST 2005

Thanks Jason, 

If I simply wanted the 3'most 1Kbp of CDS I would have taken your
approach, but, I rather needed to focus my algorithm on this location,
which may be a split location, and then select the 'best' sublocation
(exon) which doesn't span a splice site, so I needed to preserve the
gaps.  

I'm guess I could have simply collected sequence starting at the 3'most
sublocation and stopped when I got my fill... I guess I found the
expressivity of taking an (appropriatly oriented) `slice` of the entire
CDS to compelling to avoid adding it to LocationI, which I hope to do
one I have my implementation performance detail worked out (film at 11).
Any objections to this pending addition?  I think it is in spirit, do
you?

Looking at Bio::Coordinate::GeneMapper, I don't see how this could
address this specifically.  Pointers welcomed!

Cheers,

Malcolm

-----Original Message-----
From: Jason Stajich [mailto:jason.stajich at duke.edu] 
Sent: Saturday, December 10, 2005 12:12 PM
To: Cook, Malcolm
Cc: bioperl-l; Seidel, Christopher
Subject: Re: [Bioperl-l] HOWTO: take a slice of a split location

Hi Malcom -
Don't have a chance to look at your code, but my approach to this  
problem would be to first splice the sequence out from the genome
  my $feature = Bio::SeqFeature::Generic->new(-location =>  
$splitlocation);
  my $cdsseq = $feature->spliced_seq;
then just retrieve the last 1000 bases of  this sequence.
  my $threeprime = $cdsseq->subseq($cdsseq->length - 1000, $cdsseq- 
 >length);
(this might be off-by-one?)

There is also a module to map between coordinates -  
Bio::Coordinate::GeneMapper if you need to go from transcript to  
genomic coordinates.

-jason
On Dec 10, 2005, at 2:06 AM, Cook, Malcolm wrote:

> Fellow Bioperlers,
>
> I was in need of extracting the 3'-most 1000 bp of from multiple  
> genomic CDS regions (designing 70mer u-array probes).
>
> I looked in vain for Bio::Location->splice($from,$to);
>
> So I wrote one which works but suffers from actually materializing  
> the list of interger indices into the sequence for every base.
>
> Has anyone a better approach they'd care to share?
>
> Malcolm Cook - mec at stowers-institute.org
> Stowers Institute for Medical Research - Kansas City, MO  USA
>
> P.S. Here' what I wrote:
>
> package Bio::LocationI;		# Code in the interface so it
works
>                                 # with both ::Split and ::Simple
>                                 # Bio::Locations
>
> sub _intspans {
>   # Purpose: for a (presumably) monotonically increasing list of
>   # integers, return list of arrays each holding min and max of
>   # the list's internal contiguous spans.
>   #
>   # Example: 1..5,10..20,30 => ([1,5],[10,20],[30,30])
>   my @i = @_;
>   die "nothing passed to intspans" unless @i;
>   my @s = ([$i[0],shift(@i)]);
>   foreach (@i) {
>     if ($_ == 1 + $s[0][1]) {
>       $s[0][1] = $_;
>     } else {
>       unshift @s, [$_, $_]
>     }}
>   reverse @s;
> }
>
> sub slice {
>   # Purpose: compute a slice of the Location, using perls normal slice
>   # semantics, expect that it trims out of range values.
>   my ($self, $from, $to) = @_;
>   my @int = eval (join ',', map {$_->start . '..' . $_->end} $self- 
> >each_Location); # build perl expression using the range (..) and  
> list (,) operators.
>   @int = @int[$from..$to];
>   @int = grep {$_} @int;	# Removing undefs (in case $from/$to out

> of bounds).
>   my @intspans = _intspans(@int);
>   new  Bio::Location::Split (-strand => $self->strand,
> 			     -locations => [map {new
Bio::Location::Simple(-start => $_-> 
> [0],
>
-end   => $_->[1],
>
-strand => $self->strand,
>
)
> 					       } @intspans],
> 			    );
> }
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12