[Bioperl-l] Clear range from Bio::Seq::Quality?

Heikki Lehvaslaiho heikki.lehvaslaiho at gmail.com
Mon Apr 27 06:42:08 UTC 2009


Dan,

It looks like your method does two different things:

1. Returns the longest subsequence above the threshold
2. Analyses the the sequence for the number of ranges the current
threshold creates.

Why not separate these functions?

Lets add a method that sets the threshold and stores it internally as
$self->_threshold. Setting it to a new values should trigger emptying
all the caches (see below.)

Lets have two more public methods:

1. get_clean_range() - optional argument 'threshold'

It returns the longest clean subseq.

2. count_clean_ranges() -again optional argument 'threshold'

This returns the number of ranges detected.

Both methods call first the public method threshold if the argument
has been given and then an internal method  _find_clean_ranges(). That
method calculates all the ranges and stores them internally  (as
$self->_clean_ranges-> [...]). The number of ranges is also stored
(e.g. $self->_number_of ranges).These internal values form  the cache
that needs to be emptied whenever any of the critical values of the
object changes: threshold, quality or seq. Create an internal method
$self->_clear_cache, that does that.

Now the quality new object does not get created until you call
get_clean_range() which accesses the cached values (or creates them if
they are not there).

This design allows you to have no extra penalty for adding more
methods that act on cached values. For example, it might be sensible
thing to do  at some point to look at all the ranges that are longer
than some length. Then you could write in your program:


$qual->threshold(10);
if ($qual->count_clean_ranges = 1) {
  my $newqual = $qual->get_clean_range()
  # do your analysis
} elsif ($qual->count_clean_ranges = 0) {
   # do some reporting and logging
} else {  # more than one ranges
   my @quals = $qual->get_all_clean_ranges($min_lenght);
   # do some more work and possibly select the best one(s)
}



Yours,

   -Heikki

2009/4/24 Chris Fields <cjfields at illinois.edu>:
> You could submit this as a diff against Bio::Seq::Quality to bugzilla.  If
> possible, tests don't hurt either!
>
> chris
>
> On Apr 24, 2009, at 11:20 AM, Dan Bolser wrote:
>
>> Its a bit rough and ready, but it does what I need...
>>
>>
>>
>>
>> =head2 get_clear_range
>>
>> Title    : get_clear_range
>>
>> Title    : subqual
>> Usage    : $subobj = $obj->get_clear_range();
>>           $subobj = $obj->get_clear_range(20);
>> Function : Get the clear range using the given quality score as a
>>           cutoff or a default value of 13.
>>
>> Returns  : a new Bio::Seq::Quality object
>> Args     : a minimum quality value, optional, devault = 13
>>
>> =cut
>>
>> sub get_clear_range
>> {
>>   my $self = shift;
>>   my $qual = $self->qual;
>>   my $minQual = shift || 13;
>>
>>   my (@ranges, $rangeFlag);
>>
>>   for(my $i=0; $i<@$qual; $i++){
>>        ## Are we currently within a clear range or not?
>>        if(defined($rangeFlag)){
>>            ## Did we just leave the clear range?
>>            if($qual->[$i]<$minQual){
>>                ## Log the range
>>                push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag];
>>                ## and reset the range flag.
>>                $rangeFlag = undef;
>>            }
>>            ## else nothing changes
>>        }
>>        else{
>>            ## Did we just enter a clear range?
>>            if($qual->[$i]>=$minQual){
>>                ## Better set the range flag!
>>                $rangeFlag = $i;
>>            }
>>            ## else nothing changes
>>        }
>>   }
>>   ## Did we exit the last clear range?
>>   if(defined($rangeFlag)){
>>        my $i = scalar(@$qual);
>>        ## Log the range
>>        push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag];
>>   }
>>
>>   unless(@ranges){
>>        die "There is no clear range... I don't know what to do here!\n";
>>   }
>>
>>   print "there are ", scalar(@ranges), " clear ranges\n";
>>
>>   my $sum; map {$sum += $_->[2]} @ranges;
>>
>>   print "of ", scalar(@$qual), " bases, there are $sum with ".
>>        "quality scores above the given threshold\n";
>>
>>   for (sort {$b->[2] <=> $a->[2]} @ranges){
>>        if($_->[2]/$sum < 0.5){
>>            warn "not so much a clear range as a clear chunk...\n";
>>        }
>>        print $_->[2], "\t", $_->[2]/$sum, "\n";
>>
>>        return Bio::Seq::QualityDB->new( -seq => $self->subseq(  $_->[0]+1,
>> $_->[1]+1),
>>                                         -qual => $self->subqual($_->[0]+1,
>> $_->[1]+1)
>>                                         );
>>   }
>> }
>>
>>
>>
>>
>> Note, for testing I made a package called Bio/Seq/QualityDB.pm (which
>> is a copy of Bio/Seq/Quality.pm that just has the above method added).
>> That is why the 'new Bio::Seq::Quality object' is actually a
>> Bio::Seq::QualityDB object, but other than that it should slot right
>> in (apart from all the debugging output that I spit out).
>>
>>
>> Cheers,
>> Dan.
>>
>>
>> 2009/4/24 Dan Bolser <dan.bolser at gmail.com>:
>>>
>>> Hi all,
>>>
>>> I couldn't find out how to get the 'clear range' from a
>>> Bio::Seq::Quality object... Am I looking in the wrong place, or should
>>> this method be a part of the Bio::Seq::Quality class?
>>>
>>> In the latter case I'm on my way to an implementation, but I am not
>>> good at navigating the bioperl docs, so I thought I should ask before
>>> I take the time to finish that off.
>>>
>>>
>>> Cheers,
>>> Dan.
>>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



-- 
    -Heikki
Heikki Lehvaslaiho - skype:heikki_lehvaslaiho
cell: +27 (0)714328090
Sent from Claremont, WC, South Africa




More information about the Bioperl-l mailing list