[Bioperl-l] HOWTO: take a slice of a split location

Cook, Malcolm MEC at stowers-institute.org
Sat Dec 10 02:06:03 EST 2005


Fellow Bioperlers,

I was in need of extracting the 3'-most 1000 bp of from multiple genomic CDS regions (designing 70mer u-array probes).

I looked in vain for Bio::Location->splice($from,$to);

So I wrote one which works but suffers from actually materializing the list of interger indices into the sequence for every base.

Has anyone a better approach they'd care to share?  

Malcolm Cook - mec at stowers-institute.org
Stowers Institute for Medical Research - Kansas City, MO  USA 

P.S. Here' what I wrote:

package Bio::LocationI;		# Code in the interface so it works
                                # with both ::Split and ::Simple
                                # Bio::Locations

sub _intspans {
  # Purpose: for a (presumably) monotonically increasing list of
  # integers, return list of arrays each holding min and max of
  # the list's internal contiguous spans.
  #
  # Example: 1..5,10..20,30 => ([1,5],[10,20],[30,30])
  my @i = @_;
  die "nothing passed to intspans" unless @i;
  my @s = ([$i[0],shift(@i)]);
  foreach (@i) {
    if ($_ == 1 + $s[0][1]) {
      $s[0][1] = $_;
    } else {
      unshift @s, [$_, $_]
    }}
  reverse @s;
}

sub slice {
  # Purpose: compute a slice of the Location, using perls normal slice
  # semantics, expect that it trims out of range values.
  my ($self, $from, $to) = @_;
  my @int = eval (join ',', map {$_->start . '..' . $_->end} $self->each_Location); # build perl expression using the range (..) and list (,) operators.
  @int = @int[$from..$to];
  @int = grep {$_} @int;	# Removing undefs (in case $from/$to out of bounds).
  my @intspans = _intspans(@int); 
  new  Bio::Location::Split (-strand => $self->strand,
			     -locations => [map {new Bio::Location::Simple(-start => $_->[0],
									   -end   => $_->[1],
									   -strand => $self->strand,
									  )
					       } @intspans],
			    );
}



More information about the Bioperl-l mailing list