[Bioperl-l] Re: [Bioperl-guts-l] bioperl commit

Stefan Kirov skirov at utk.edu
Thu May 27 18:18:56 EDT 2004



Aaron J. Mackey wrote:

>
> OK, so there's nothing wrong with me asking for a threshold less than  
> 0.3, it just means that I may get lots of N's, but maybe not (depends  
> on the profile).  I don't think those limits should be hard coded  
> (rather, they should be limits of 0 and 1.0).  0.0 gives me all N's,  
> while 1.0 gives me undef (though 0.99 could give me a sequence back 
> if  all the positions have one residue with p > 0.99).  -0.05 or 1.2 
> or 10,  etc. should throw.
>
Done.

> What I meant about using Bio::Tools::IUPAC is that from the threshold  
> and prob. matrix, you should be able to generate a single nucleotide  
> sequence with IUPAC ambiguity codes (i.e. "N", "R", "Y", etc).  
> Perhaps  that's all you should do, and let the user then do what they 
> want with  it (perhaps give it to Bio::Tools::IUPAC to get the seq 
> stream, but  perhaps not).  In which case, you'd rename this to 
> get_consensus() or  somesuch.
>
Yes, I got this, there is $matrix->IUPAC method that will do that. 
However, I find the generation of set of sequences through a threshold 
less confusing than playing with the same threshold to generate less or 
more relaxed IUPAC consensus. I guess there is place for both 
approaches, but this one more convinient for me.
Stefan

> -Aaron
>
> On May 27, 2004, at 5:25 PM, Stefan Kirov wrote:
>
>> Hi Aaron,
>> Yes, it might be a bit weird. Bio::Matrix::PSM is nucleotide 
>> specific,  but you are right that the method is essentially the same 
>> as the seq  stream, provided by Bio::Tools::IUPAC, except for the 
>> threshold. And  yes, maybe it makes sense to make the threshold 0.3 
>> instead of 3. It's  just the way I have used it so far. It is not 
>> hard to change it. Range  for the thresholds: I guess this is more 
>> nucleotide specific stuff- a  frequency threshold under 0.3 is likely 
>> to get almost everything  accepted (so you get N), where seq 
>> threshold above  0.7 is likely to  give you a sequence equal to what 
>> Bio::Matrix::PSM::consensus will  give you. Why not use the 
>> Bio::Tools::IUPAC- because of the threshold.  I guess I could 
>> generate several IUPAC consensus sequences and do then  the 
>> Bio::Tools::IUPAC::next_seq, but this is much more  straightforward. 
>> On the other hand maybe Bio::Tools::IUPAC could be  extended to 
>> accept threshold?
>> Stefan
>>
>> Aaron J. Mackey wrote:
>>
>>>
>>> Hi Stefan,
>>>
>>> I have a few questions about this latest commit; I'm sure it does  
>>> what  you need it to do, but it's a little "crufty".
>>>
>>> What does this mean, why would you provide a probability threshold 
>>> in   whole integers, and why are values outside of 3 and 7 illegal?  
>>> Is   Bio::Matrix::PSM nucleotide specific?  Why wouldn't this   
>>> "get_all_vectors" method be useful for any PSM?  Why not use   
>>> Bio::Tools::IUPAC to generate a sequence stream from a calculated   
>>> consensus sequence?
>>>
>>> -Aaron
>>>
>>> On May 27, 2004, at 2:37 PM, Stefan Kirov wrote:
>>>
>>>>
>>>> skirov
>>>> Thu May 27 14:37:54 EDT 2004
>>>> Update of /home/repository/bioperl/bioperl-live/Bio/Matrix/PSM
>>>> In directory pub.open-bio.org:/tmp/cvs-serv8640
>>>>
>>>> Modified Files:
>>>>     SiteMatrix.pm SiteMatrixI.pm
>>>> Log Message:
>>>> method added: get_all_vectors, all possible seq to satisfy the 
>>>> PFM   under a give threshold
>>>>
>>>> bioperl-live/Bio/Matrix/PSM SiteMatrix.pm,1.15,1.16   
>>>> SiteMatrixI.pm,1.7,1.8
>>>> ===================================================================
>>>> RCS file:   
>>>> /home/repository/bioperl/bioperl-live/Bio/Matrix/PSM/SiteMatrix.pm,v
>>>> retrieving revision 1.15
>>>> retrieving revision 1.16
>>>> diff -u -r1.15 -r1.16
>>>> ---   
>>>> /home/repository/bioperl/bioperl-live/Bio/Matrix/PSM/SiteMatrix.pm    
>>>>   2004/05/12 18:27:30    1.15
>>>> +++   
>>>> /home/repository/bioperl/bioperl-live/Bio/Matrix/PSM/SiteMatrix.pm    
>>>>   2004/05/27 18:37:54    1.16
>>>> @@ -883,4 +883,48 @@
>>>>  return $score;
>>>>  }
>>>>
>>>> +
>>>> +=head2 get_all_vectors
>>>> +
>>>> + Title   : get_all_vectors
>>>> + Usage   :
>>>> + Function:  returns all possible sequence vectors to satisfy the  
>>>> PFM  under
>>>> +            a given threshold
>>>> + Throws  :  If threshold outside of 3..7 (no sense to do that)
>>>> + Example :  my @vectors=$self->get_all_vectors(4);
>>>> + Returns :  Array of strings
>>>> + Args    :  (optional) floating
>>>> +
>>>> +=cut
>>>> +
>>>> +sub get_all_vectors {
>>>> +    my $self=shift;
>>>> +    my $thresh=shift;
>>>> +  $self->throw("Out of range. Threshold should be >3 and 7<.\n") 
>>>> if   (($thresh<3) || ($thresh>7));
>>>> +  my @seq=split(//,$self->consensus($thresh));
>>>> +  my @perm;
>>>> +  $thresh=$thresh/10;
>>>> +  for my $i (0..@{$self->{probA}}) {
>>>> +    push @{$perm[$i]},'A' if ($self->{probA}->[$i]>$thresh);
>>>> +    push @{$perm[$i]},'C' if ($self->{probC}->[$i]>$thresh);
>>>> +    push @{$perm[$i]},'G' if ($self->{probG}->[$i]>$thresh);
>>>> +    push @{$perm[$i]},'T' if ($self->{probT}->[$i]>$thresh);
>>>> +    push @{$perm[$i]},'N' if  ($seq[$i] eq 'N');
>>>> +  }
>>>> +  my $fpos=shift @perm;
>>>> +  my @strings=@$fpos;
>>>> +  foreach my $pos (@perm) {
>>>> +    my @newstr;
>>>> +    foreach my $let (@$pos) {
>>>> +      foreach my $string (@strings) {
>>>> +        my $newstring = $string . $let;
>>>> +        push @newstr,$newstring;
>>>> +      }
>>>> +    }
>>>> +    @strings=@newstr;
>>>> +  }
>>>> +    return @strings;
>>>> +}
>>>> +
>>>> +
>>>>  1;
>>>>
>>>> ===================================================================
>>>> RCS file:   /home/repository/bioperl/bioperl-live/Bio/Matrix/PSM/ 
>>>> SiteMatrixI.pm,v
>>>> retrieving revision 1.7
>>>> retrieving revision 1.8
>>>> diff -u -r1.7 -r1.8
>>>> ---   
>>>> /home/repository/bioperl/bioperl-live/Bio/Matrix/PSM/SiteMatrixI.pm   
>>>>    2004/05/12 18:27:30    1.7
>>>> +++   
>>>> /home/repository/bioperl/bioperl-live/Bio/Matrix/PSM/SiteMatrixI.pm   
>>>>    2004/05/27 18:37:54    1.8
>>>> @@ -572,5 +572,21 @@
>>>>      $self->throw_not_implemented();
>>>>  }
>>>>
>>>> +=head2 get_all_vectors
>>>>
>>>> + Title   : get_all_vectors
>>>> + Usage   :
>>>> + Function:  returns all possible sequence vectors to satisfy the  
>>>> PFM  under
>>>> +            a given threshold
>>>> + Throws  :  If threshold outside of 3..7 (no sense to do that)
>>>> + Example :  my @vectors=$self->get_all_vectors(4);
>>>> + Returns :  Array of strings
>>>> + Args    :  (optional) floating
>>>> +
>>>> +=cut
>>>> +
>>>> +sub get_all_vectors {
>>>> + my $self = shift;
>>>> +    $self->throw_not_implemented();
>>>> +}
>>>>  1;
>>>>
>>>> _______________________________________________
>>>> Bioperl-guts-l mailing list
>>>> Bioperl-guts-l at portal.open-bio.org
>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l
>>>>
>>>>
>>> -- 
>>> Aaron J. Mackey, Ph.D.
>>> Dept. of Biology, Goddard 212
>>> University of Pennsylvania       email:  amackey at pcbi.upenn.edu
>>> 415 S. University Avenue         office: 215-898-1205
>>> Philadelphia, PA  19104-6017     fax:    215-746-6697
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>
>> -- 
>> Stefan Kirov, Ph.D.
>> University of Tennessee/Oak Ridge National Laboratory
>> 1060 Commerce Park, Oak Ridge
>> TN 37830-8026
>> USA
>> tel +865 576 5120
>> fax +865 241 1965
>> e-mail: skirov at utk.edu
>> sao at ornl.gov
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
> -- 
> Aaron J. Mackey, Ph.D.
> Dept. of Biology, Goddard 212
> University of Pennsylvania       email:  amackey at pcbi.upenn.edu
> 415 S. University Avenue         office: 215-898-1205
> Philadelphia, PA  19104-6017     fax:    215-746-6697
>

-- 
Stefan Kirov, Ph.D.
University of Tennessee/Oak Ridge National Laboratory
1060 Commerce Park, Oak Ridge
TN 37830-8026
USA
tel +865 576 5120
fax +865 241 1965
e-mail: skirov at utk.edu
sao at ornl.gov



More information about the Bioperl-l mailing list