[Bioperl-l] HOWTO: take a slice of a split location
Cook, Malcolm
MEC at stowers-institute.org
Mon Dec 12 10:45:06 EST 2005
Thanks Jason,
If I simply wanted the 3'most 1Kbp of CDS I would have taken your
approach, but, I rather needed to focus my algorithm on this location,
which may be a split location, and then select the 'best' sublocation
(exon) which doesn't span a splice site, so I needed to preserve the
gaps.
I'm guess I could have simply collected sequence starting at the 3'most
sublocation and stopped when I got my fill... I guess I found the
expressivity of taking an (appropriatly oriented) `slice` of the entire
CDS to compelling to avoid adding it to LocationI, which I hope to do
one I have my implementation performance detail worked out (film at 11).
Any objections to this pending addition? I think it is in spirit, do
you?
Looking at Bio::Coordinate::GeneMapper, I don't see how this could
address this specifically. Pointers welcomed!
Cheers,
Malcolm
-----Original Message-----
From: Jason Stajich [mailto:jason.stajich at duke.edu]
Sent: Saturday, December 10, 2005 12:12 PM
To: Cook, Malcolm
Cc: bioperl-l; Seidel, Christopher
Subject: Re: [Bioperl-l] HOWTO: take a slice of a split location
Hi Malcom -
Don't have a chance to look at your code, but my approach to this
problem would be to first splice the sequence out from the genome
my $feature = Bio::SeqFeature::Generic->new(-location =>
$splitlocation);
my $cdsseq = $feature->spliced_seq;
then just retrieve the last 1000 bases of this sequence.
my $threeprime = $cdsseq->subseq($cdsseq->length - 1000, $cdsseq-
>length);
(this might be off-by-one?)
There is also a module to map between coordinates -
Bio::Coordinate::GeneMapper if you need to go from transcript to
genomic coordinates.
-jason
On Dec 10, 2005, at 2:06 AM, Cook, Malcolm wrote:
> Fellow Bioperlers,
>
> I was in need of extracting the 3'-most 1000 bp of from multiple
> genomic CDS regions (designing 70mer u-array probes).
>
> I looked in vain for Bio::Location->splice($from,$to);
>
> So I wrote one which works but suffers from actually materializing
> the list of interger indices into the sequence for every base.
>
> Has anyone a better approach they'd care to share?
>
> Malcolm Cook - mec at stowers-institute.org
> Stowers Institute for Medical Research - Kansas City, MO USA
>
> P.S. Here' what I wrote:
>
> package Bio::LocationI; # Code in the interface so it
works
> # with both ::Split and ::Simple
> # Bio::Locations
>
> sub _intspans {
> # Purpose: for a (presumably) monotonically increasing list of
> # integers, return list of arrays each holding min and max of
> # the list's internal contiguous spans.
> #
> # Example: 1..5,10..20,30 => ([1,5],[10,20],[30,30])
> my @i = @_;
> die "nothing passed to intspans" unless @i;
> my @s = ([$i[0],shift(@i)]);
> foreach (@i) {
> if ($_ == 1 + $s[0][1]) {
> $s[0][1] = $_;
> } else {
> unshift @s, [$_, $_]
> }}
> reverse @s;
> }
>
> sub slice {
> # Purpose: compute a slice of the Location, using perls normal slice
> # semantics, expect that it trims out of range values.
> my ($self, $from, $to) = @_;
> my @int = eval (join ',', map {$_->start . '..' . $_->end} $self-
> >each_Location); # build perl expression using the range (..) and
> list (,) operators.
> @int = @int[$from..$to];
> @int = grep {$_} @int; # Removing undefs (in case $from/$to out
> of bounds).
> my @intspans = _intspans(@int);
> new Bio::Location::Split (-strand => $self->strand,
> -locations => [map {new
Bio::Location::Simple(-start => $_->
> [0],
>
-end => $_->[1],
>
-strand => $self->strand,
>
)
> } @intspans],
> );
> }
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
--
Jason Stajich
Duke University
http://www.duke.edu/~jes12
More information about the Bioperl-l
mailing list