[Bioperl-l] Clear range from Bio::Seq::Quality?

Dan Bolser dan.bolser at gmail.com
Tue Aug 4 12:14:02 UTC 2009


2009/4/27 Heikki Lehvaslaiho <heikki.lehvaslaiho at gmail.com>:
> Dan,
>
> Have a look at Bio/Seq/Quality.pm and t/Seq/Quality.t in bioperl-live.
>
> Test and extend,
>
>    -Heikki

Thanks for help with this. I finally got round to looking at the code
(after several others had done the same). I have messed with the code
a bit, and added a 'mask_below_threshold' method [1] and some tests to
go with it (including some extra tests) [2].

Cheers,
Dan.


[1] http://bugzilla.open-bio.org/show_bug.cgi?id=2897
[2] http://bugzilla.open-bio.org/show_bug.cgi?id=2898


> 2009/4/27 Heikki Lehvaslaiho <heikki.lehvaslaiho at gmail.com>:
>> Dan,
>>
>> I'll take your code and put it into bioperl-live rewritten the way I
>> suggested and add few tests.
>>
>> That should get you started,
>>
>>   -Heikki
>>
>> 2009/4/27 Dan Bolser <dan.bolser at gmail.com>:
>>> Hi Heikki,
>>>
>>> Thanks very much for the advice on how to better implement the clear
>>> range method within the Bio::Seq::Quality object. I can understand the
>>> logic of what you have written, and it all sounds reasonable. The only
>>> problem is that I am very inexperienced with working on object
>>> oriented Perl (my 'one man' projects to date have never really
>>> required me to think beyond scripts, and its been years since I
>>> actually tried to code objects in Perl).
>>>
>>> To be specific, when you say, "Lets add a method that sets the
>>> threshold and stores it internally as $self->_threshold", ignoring any
>>> other functionality, what would that method look like? in particular,
>>> how would $self->_threshold be implemented?
>>>
>>> I think once I see that detail, I can go ahead and try to code what
>>> you suggested.
>>>
>>>
>>> Similarly (Chris), where would I put the tests / how would they be implemented?
>>>
>>>
>>> Thanks again for the feedback.
>>>
>>> All the best,
>>> Dan.
>>>
>>>
>>>
>>> 2009/4/27 Heikki Lehvaslaiho <heikki.lehvaslaiho at gmail.com>:
>>>> Dan,
>>>>
>>>> It looks like your method does two different things:
>>>>
>>>> 1. Returns the longest subsequence above the threshold
>>>> 2. Analyses the the sequence for the number of ranges the current
>>>> threshold creates.
>>>>
>>>> Why not separate these functions?
>>>>
>>>> Lets add a method that sets the threshold and stores it internally as
>>>> $self->_threshold. Setting it to a new values should trigger emptying
>>>> all the caches (see below.)
>>>>
>>>> Lets have two more public methods:
>>>>
>>>> 1. get_clean_range() - optional argument 'threshold'
>>>>
>>>> It returns the longest clean subseq.
>>>>
>>>> 2. count_clean_ranges() -again optional argument 'threshold'
>>>>
>>>> This returns the number of ranges detected.
>>>>
>>>> Both methods call first the public method threshold if the argument
>>>> has been given and then an internal method  _find_clean_ranges(). That
>>>> method calculates all the ranges and stores them internally  (as
>>>> $self->_clean_ranges-> [...]). The number of ranges is also stored
>>>> (e.g. $self->_number_of ranges).These internal values form  the cache
>>>> that needs to be emptied whenever any of the critical values of the
>>>> object changes: threshold, quality or seq. Create an internal method
>>>> $self->_clear_cache, that does that.
>>>>
>>>> Now the quality new object does not get created until you call
>>>> get_clean_range() which accesses the cached values (or creates them if
>>>> they are not there).
>>>>
>>>> This design allows you to have no extra penalty for adding more
>>>> methods that act on cached values. For example, it might be sensible
>>>> thing to do  at some point to look at all the ranges that are longer
>>>> than some length. Then you could write in your program:
>>>>
>>>>
>>>> $qual->threshold(10);
>>>> if ($qual->count_clean_ranges = 1) {
>>>>  my $newqual = $qual->get_clean_range()
>>>>  # do your analysis
>>>> } elsif ($qual->count_clean_ranges = 0) {
>>>>   # do some reporting and logging
>>>> } else {  # more than one ranges
>>>>   my @quals = $qual->get_all_clean_ranges($min_lenght);
>>>>   # do some more work and possibly select the best one(s)
>>>> }
>>>>
>>>>
>>>>
>>>> Yours,
>>>>
>>>>   -Heikki
>>>>
>>>> 2009/4/24 Chris Fields <cjfields at illinois.edu>:
>>>>> You could submit this as a diff against Bio::Seq::Quality to bugzilla.  If
>>>>> possible, tests don't hurt either!
>>>>>
>>>>> chris
>>>>>
>>>>> On Apr 24, 2009, at 11:20 AM, Dan Bolser wrote:
>>>>>
>>>>>> Its a bit rough and ready, but it does what I need...
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> =head2 get_clear_range
>>>>>>
>>>>>> Title    : get_clear_range
>>>>>>
>>>>>> Title    : subqual
>>>>>> Usage    : $subobj = $obj->get_clear_range();
>>>>>>           $subobj = $obj->get_clear_range(20);
>>>>>> Function : Get the clear range using the given quality score as a
>>>>>>           cutoff or a default value of 13.
>>>>>>
>>>>>> Returns  : a new Bio::Seq::Quality object
>>>>>> Args     : a minimum quality value, optional, devault = 13
>>>>>>
>>>>>> =cut
>>>>>>
>>>>>> sub get_clear_range
>>>>>> {
>>>>>>   my $self = shift;
>>>>>>   my $qual = $self->qual;
>>>>>>   my $minQual = shift || 13;
>>>>>>
>>>>>>   my (@ranges, $rangeFlag);
>>>>>>
>>>>>>   for(my $i=0; $i<@$qual; $i++){
>>>>>>        ## Are we currently within a clear range or not?
>>>>>>        if(defined($rangeFlag)){
>>>>>>            ## Did we just leave the clear range?
>>>>>>            if($qual->[$i]<$minQual){
>>>>>>                ## Log the range
>>>>>>                push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag];
>>>>>>                ## and reset the range flag.
>>>>>>                $rangeFlag = undef;
>>>>>>            }
>>>>>>            ## else nothing changes
>>>>>>        }
>>>>>>        else{
>>>>>>            ## Did we just enter a clear range?
>>>>>>            if($qual->[$i]>=$minQual){
>>>>>>                ## Better set the range flag!
>>>>>>                $rangeFlag = $i;
>>>>>>            }
>>>>>>            ## else nothing changes
>>>>>>        }
>>>>>>   }
>>>>>>   ## Did we exit the last clear range?
>>>>>>   if(defined($rangeFlag)){
>>>>>>        my $i = scalar(@$qual);
>>>>>>        ## Log the range
>>>>>>        push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag];
>>>>>>   }
>>>>>>
>>>>>>   unless(@ranges){
>>>>>>        die "There is no clear range... I don't know what to do here!\n";
>>>>>>   }
>>>>>>
>>>>>>   print "there are ", scalar(@ranges), " clear ranges\n";
>>>>>>
>>>>>>   my $sum; map {$sum += $_->[2]} @ranges;
>>>>>>
>>>>>>   print "of ", scalar(@$qual), " bases, there are $sum with ".
>>>>>>        "quality scores above the given threshold\n";
>>>>>>
>>>>>>   for (sort {$b->[2] <=> $a->[2]} @ranges){
>>>>>>        if($_->[2]/$sum < 0.5){
>>>>>>            warn "not so much a clear range as a clear chunk...\n";
>>>>>>        }
>>>>>>        print $_->[2], "\t", $_->[2]/$sum, "\n";
>>>>>>
>>>>>>        return Bio::Seq::QualityDB->new( -seq => $self->subseq(  $_->[0]+1,
>>>>>> $_->[1]+1),
>>>>>>                                         -qual => $self->subqual($_->[0]+1,
>>>>>> $_->[1]+1)
>>>>>>                                         );
>>>>>>   }
>>>>>> }
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Note, for testing I made a package called Bio/Seq/QualityDB.pm (which
>>>>>> is a copy of Bio/Seq/Quality.pm that just has the above method added).
>>>>>> That is why the 'new Bio::Seq::Quality object' is actually a
>>>>>> Bio::Seq::QualityDB object, but other than that it should slot right
>>>>>> in (apart from all the debugging output that I spit out).
>>>>>>
>>>>>>
>>>>>> Cheers,
>>>>>> Dan.
>>>>>>
>>>>>>
>>>>>> 2009/4/24 Dan Bolser <dan.bolser at gmail.com>:
>>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I couldn't find out how to get the 'clear range' from a
>>>>>>> Bio::Seq::Quality object... Am I looking in the wrong place, or should
>>>>>>> this method be a part of the Bio::Seq::Quality class?
>>>>>>>
>>>>>>> In the latter case I'm on my way to an implementation, but I am not
>>>>>>> good at navigating the bioperl docs, so I thought I should ask before
>>>>>>> I take the time to finish that off.
>>>>>>>
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Dan.
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>    -Heikki
>>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho
>>>> cell: +27 (0)714328090
>>>> Sent from Claremont, WC, South Africa
>>>>
>>>
>>
>>
>>
>> --
>>    -Heikki
>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho
>> cell: +27 (0)714328090
>> Sent from Claremont, WC, South Africa
>>
>
>
>
> --
>    -Heikki
> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho
> cell: +27 (0)714328090
> Sent from Claremont, WC, South Africa
>




More information about the Bioperl-l mailing list