[Bioperl-l] Clear range from Bio::Seq::Quality?

Heikki Lehvaslaiho heikki.lehvaslaiho at gmail.com
Mon Apr 27 09:41:52 UTC 2009


Dan,

I'll take your code and put it into bioperl-live rewritten the way I
suggested and add few tests.

That should get you started,

   -Heikki

2009/4/27 Dan Bolser <dan.bolser at gmail.com>:
> Hi Heikki,
>
> Thanks very much for the advice on how to better implement the clear
> range method within the Bio::Seq::Quality object. I can understand the
> logic of what you have written, and it all sounds reasonable. The only
> problem is that I am very inexperienced with working on object
> oriented Perl (my 'one man' projects to date have never really
> required me to think beyond scripts, and its been years since I
> actually tried to code objects in Perl).
>
> To be specific, when you say, "Lets add a method that sets the
> threshold and stores it internally as $self->_threshold", ignoring any
> other functionality, what would that method look like? in particular,
> how would $self->_threshold be implemented?
>
> I think once I see that detail, I can go ahead and try to code what
> you suggested.
>
>
> Similarly (Chris), where would I put the tests / how would they be implemented?
>
>
> Thanks again for the feedback.
>
> All the best,
> Dan.
>
>
>
> 2009/4/27 Heikki Lehvaslaiho <heikki.lehvaslaiho at gmail.com>:
>> Dan,
>>
>> It looks like your method does two different things:
>>
>> 1. Returns the longest subsequence above the threshold
>> 2. Analyses the the sequence for the number of ranges the current
>> threshold creates.
>>
>> Why not separate these functions?
>>
>> Lets add a method that sets the threshold and stores it internally as
>> $self->_threshold. Setting it to a new values should trigger emptying
>> all the caches (see below.)
>>
>> Lets have two more public methods:
>>
>> 1. get_clean_range() - optional argument 'threshold'
>>
>> It returns the longest clean subseq.
>>
>> 2. count_clean_ranges() -again optional argument 'threshold'
>>
>> This returns the number of ranges detected.
>>
>> Both methods call first the public method threshold if the argument
>> has been given and then an internal method  _find_clean_ranges(). That
>> method calculates all the ranges and stores them internally  (as
>> $self->_clean_ranges-> [...]). The number of ranges is also stored
>> (e.g. $self->_number_of ranges).These internal values form  the cache
>> that needs to be emptied whenever any of the critical values of the
>> object changes: threshold, quality or seq. Create an internal method
>> $self->_clear_cache, that does that.
>>
>> Now the quality new object does not get created until you call
>> get_clean_range() which accesses the cached values (or creates them if
>> they are not there).
>>
>> This design allows you to have no extra penalty for adding more
>> methods that act on cached values. For example, it might be sensible
>> thing to do  at some point to look at all the ranges that are longer
>> than some length. Then you could write in your program:
>>
>>
>> $qual->threshold(10);
>> if ($qual->count_clean_ranges = 1) {
>>  my $newqual = $qual->get_clean_range()
>>  # do your analysis
>> } elsif ($qual->count_clean_ranges = 0) {
>>   # do some reporting and logging
>> } else {  # more than one ranges
>>   my @quals = $qual->get_all_clean_ranges($min_lenght);
>>   # do some more work and possibly select the best one(s)
>> }
>>
>>
>>
>> Yours,
>>
>>   -Heikki
>>
>> 2009/4/24 Chris Fields <cjfields at illinois.edu>:
>>> You could submit this as a diff against Bio::Seq::Quality to bugzilla.  If
>>> possible, tests don't hurt either!
>>>
>>> chris
>>>
>>> On Apr 24, 2009, at 11:20 AM, Dan Bolser wrote:
>>>
>>>> Its a bit rough and ready, but it does what I need...
>>>>
>>>>
>>>>
>>>>
>>>> =head2 get_clear_range
>>>>
>>>> Title    : get_clear_range
>>>>
>>>> Title    : subqual
>>>> Usage    : $subobj = $obj->get_clear_range();
>>>>           $subobj = $obj->get_clear_range(20);
>>>> Function : Get the clear range using the given quality score as a
>>>>           cutoff or a default value of 13.
>>>>
>>>> Returns  : a new Bio::Seq::Quality object
>>>> Args     : a minimum quality value, optional, devault = 13
>>>>
>>>> =cut
>>>>
>>>> sub get_clear_range
>>>> {
>>>>   my $self = shift;
>>>>   my $qual = $self->qual;
>>>>   my $minQual = shift || 13;
>>>>
>>>>   my (@ranges, $rangeFlag);
>>>>
>>>>   for(my $i=0; $i<@$qual; $i++){
>>>>        ## Are we currently within a clear range or not?
>>>>        if(defined($rangeFlag)){
>>>>            ## Did we just leave the clear range?
>>>>            if($qual->[$i]<$minQual){
>>>>                ## Log the range
>>>>                push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag];
>>>>                ## and reset the range flag.
>>>>                $rangeFlag = undef;
>>>>            }
>>>>            ## else nothing changes
>>>>        }
>>>>        else{
>>>>            ## Did we just enter a clear range?
>>>>            if($qual->[$i]>=$minQual){
>>>>                ## Better set the range flag!
>>>>                $rangeFlag = $i;
>>>>            }
>>>>            ## else nothing changes
>>>>        }
>>>>   }
>>>>   ## Did we exit the last clear range?
>>>>   if(defined($rangeFlag)){
>>>>        my $i = scalar(@$qual);
>>>>        ## Log the range
>>>>        push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag];
>>>>   }
>>>>
>>>>   unless(@ranges){
>>>>        die "There is no clear range... I don't know what to do here!\n";
>>>>   }
>>>>
>>>>   print "there are ", scalar(@ranges), " clear ranges\n";
>>>>
>>>>   my $sum; map {$sum += $_->[2]} @ranges;
>>>>
>>>>   print "of ", scalar(@$qual), " bases, there are $sum with ".
>>>>        "quality scores above the given threshold\n";
>>>>
>>>>   for (sort {$b->[2] <=> $a->[2]} @ranges){
>>>>        if($_->[2]/$sum < 0.5){
>>>>            warn "not so much a clear range as a clear chunk...\n";
>>>>        }
>>>>        print $_->[2], "\t", $_->[2]/$sum, "\n";
>>>>
>>>>        return Bio::Seq::QualityDB->new( -seq => $self->subseq(  $_->[0]+1,
>>>> $_->[1]+1),
>>>>                                         -qual => $self->subqual($_->[0]+1,
>>>> $_->[1]+1)
>>>>                                         );
>>>>   }
>>>> }
>>>>
>>>>
>>>>
>>>>
>>>> Note, for testing I made a package called Bio/Seq/QualityDB.pm (which
>>>> is a copy of Bio/Seq/Quality.pm that just has the above method added).
>>>> That is why the 'new Bio::Seq::Quality object' is actually a
>>>> Bio::Seq::QualityDB object, but other than that it should slot right
>>>> in (apart from all the debugging output that I spit out).
>>>>
>>>>
>>>> Cheers,
>>>> Dan.
>>>>
>>>>
>>>> 2009/4/24 Dan Bolser <dan.bolser at gmail.com>:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> I couldn't find out how to get the 'clear range' from a
>>>>> Bio::Seq::Quality object... Am I looking in the wrong place, or should
>>>>> this method be a part of the Bio::Seq::Quality class?
>>>>>
>>>>> In the latter case I'm on my way to an implementation, but I am not
>>>>> good at navigating the bioperl docs, so I thought I should ask before
>>>>> I take the time to finish that off.
>>>>>
>>>>>
>>>>> Cheers,
>>>>> Dan.
>>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>>
>>
>> --
>>    -Heikki
>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho
>> cell: +27 (0)714328090
>> Sent from Claremont, WC, South Africa
>>
>



-- 
    -Heikki
Heikki Lehvaslaiho - skype:heikki_lehvaslaiho
cell: +27 (0)714328090
Sent from Claremont, WC, South Africa




More information about the Bioperl-l mailing list