[Bioperl-l] Re: [Bioperl-guts-l] bioperl commit

Aaron J. Mackey amackey at pcbi.upenn.edu
Thu May 27 18:06:45 EDT 2004


OK, so there's nothing wrong with me asking for a threshold less than  
0.3, it just means that I may get lots of N's, but maybe not (depends  
on the profile).  I don't think those limits should be hard coded  
(rather, they should be limits of 0 and 1.0).  0.0 gives me all N's,  
while 1.0 gives me undef (though 0.99 could give me a sequence back if  
all the positions have one residue with p > 0.99).  -0.05 or 1.2 or 10,  
etc. should throw.

What I meant about using Bio::Tools::IUPAC is that from the threshold  
and prob. matrix, you should be able to generate a single nucleotide  
sequence with IUPAC ambiguity codes (i.e. "N", "R", "Y", etc).  Perhaps  
that's all you should do, and let the user then do what they want with  
it (perhaps give it to Bio::Tools::IUPAC to get the seq stream, but  
perhaps not).  In which case, you'd rename this to get_consensus() or  
somesuch.

-Aaron

On May 27, 2004, at 5:25 PM, Stefan Kirov wrote:

> Hi Aaron,
> Yes, it might be a bit weird. Bio::Matrix::PSM is nucleotide specific,  
> but you are right that the method is essentially the same as the seq  
> stream, provided by Bio::Tools::IUPAC, except for the threshold. And  
> yes, maybe it makes sense to make the threshold 0.3 instead of 3. It's  
> just the way I have used it so far. It is not hard to change it. Range  
> for the thresholds: I guess this is more nucleotide specific stuff- a  
> frequency threshold under 0.3 is likely to get almost everything  
> accepted (so you get N), where seq threshold above  0.7 is likely to  
> give you a sequence equal to what Bio::Matrix::PSM::consensus will  
> give you. Why not use the Bio::Tools::IUPAC- because of the threshold.  
> I guess I could generate several IUPAC consensus sequences and do then  
> the Bio::Tools::IUPAC::next_seq, but this is much more  
> straightforward. On the other hand maybe Bio::Tools::IUPAC could be  
> extended to accept threshold?
> Stefan
>
> Aaron J. Mackey wrote:
>
>>
>> Hi Stefan,
>>
>> I have a few questions about this latest commit; I'm sure it does  
>> what  you need it to do, but it's a little "crufty".
>>
>> What does this mean, why would you provide a probability threshold in  
>>  whole integers, and why are values outside of 3 and 7 illegal?  Is   
>> Bio::Matrix::PSM nucleotide specific?  Why wouldn't this   
>> "get_all_vectors" method be useful for any PSM?  Why not use   
>> Bio::Tools::IUPAC to generate a sequence stream from a calculated   
>> consensus sequence?
>>
>> -Aaron
>>
>> On May 27, 2004, at 2:37 PM, Stefan Kirov wrote:
>>
>>>
>>> skirov
>>> Thu May 27 14:37:54 EDT 2004
>>> Update of /home/repository/bioperl/bioperl-live/Bio/Matrix/PSM
>>> In directory pub.open-bio.org:/tmp/cvs-serv8640
>>>
>>> Modified Files:
>>>     SiteMatrix.pm SiteMatrixI.pm
>>> Log Message:
>>> method added: get_all_vectors, all possible seq to satisfy the PFM   
>>> under a give threshold
>>>
>>> bioperl-live/Bio/Matrix/PSM SiteMatrix.pm,1.15,1.16   
>>> SiteMatrixI.pm,1.7,1.8
>>> ===================================================================
>>> RCS file:   
>>> /home/repository/bioperl/bioperl-live/Bio/Matrix/PSM/SiteMatrix.pm,v
>>> retrieving revision 1.15
>>> retrieving revision 1.16
>>> diff -u -r1.15 -r1.16
>>> ---   
>>> /home/repository/bioperl/bioperl-live/Bio/Matrix/PSM/SiteMatrix.pm    
>>>   2004/05/12 18:27:30    1.15
>>> +++   
>>> /home/repository/bioperl/bioperl-live/Bio/Matrix/PSM/SiteMatrix.pm    
>>>   2004/05/27 18:37:54    1.16
>>> @@ -883,4 +883,48 @@
>>>  return $score;
>>>  }
>>>
>>> +
>>> +=head2 get_all_vectors
>>> +
>>> + Title   : get_all_vectors
>>> + Usage   :
>>> + Function:  returns all possible sequence vectors to satisfy the  
>>> PFM  under
>>> +            a given threshold
>>> + Throws  :  If threshold outside of 3..7 (no sense to do that)
>>> + Example :  my @vectors=$self->get_all_vectors(4);
>>> + Returns :  Array of strings
>>> + Args    :  (optional) floating
>>> +
>>> +=cut
>>> +
>>> +sub get_all_vectors {
>>> +    my $self=shift;
>>> +    my $thresh=shift;
>>> +  $self->throw("Out of range. Threshold should be >3 and 7<.\n") if  
>>>  (($thresh<3) || ($thresh>7));
>>> +  my @seq=split(//,$self->consensus($thresh));
>>> +  my @perm;
>>> +  $thresh=$thresh/10;
>>> +  for my $i (0..@{$self->{probA}}) {
>>> +    push @{$perm[$i]},'A' if ($self->{probA}->[$i]>$thresh);
>>> +    push @{$perm[$i]},'C' if ($self->{probC}->[$i]>$thresh);
>>> +    push @{$perm[$i]},'G' if ($self->{probG}->[$i]>$thresh);
>>> +    push @{$perm[$i]},'T' if ($self->{probT}->[$i]>$thresh);
>>> +    push @{$perm[$i]},'N' if  ($seq[$i] eq 'N');
>>> +  }
>>> +  my $fpos=shift @perm;
>>> +  my @strings=@$fpos;
>>> +  foreach my $pos (@perm) {
>>> +    my @newstr;
>>> +    foreach my $let (@$pos) {
>>> +      foreach my $string (@strings) {
>>> +        my $newstring = $string . $let;
>>> +        push @newstr,$newstring;
>>> +      }
>>> +    }
>>> +    @strings=@newstr;
>>> +  }
>>> +    return @strings;
>>> +}
>>> +
>>> +
>>>  1;
>>>
>>> ===================================================================
>>> RCS file:   
>>> /home/repository/bioperl/bioperl-live/Bio/Matrix/PSM/ 
>>> SiteMatrixI.pm,v
>>> retrieving revision 1.7
>>> retrieving revision 1.8
>>> diff -u -r1.7 -r1.8
>>> ---   
>>> /home/repository/bioperl/bioperl-live/Bio/Matrix/PSM/SiteMatrixI.pm   
>>>    2004/05/12 18:27:30    1.7
>>> +++   
>>> /home/repository/bioperl/bioperl-live/Bio/Matrix/PSM/SiteMatrixI.pm   
>>>    2004/05/27 18:37:54    1.8
>>> @@ -572,5 +572,21 @@
>>>      $self->throw_not_implemented();
>>>  }
>>>
>>> +=head2 get_all_vectors
>>>
>>> + Title   : get_all_vectors
>>> + Usage   :
>>> + Function:  returns all possible sequence vectors to satisfy the  
>>> PFM  under
>>> +            a given threshold
>>> + Throws  :  If threshold outside of 3..7 (no sense to do that)
>>> + Example :  my @vectors=$self->get_all_vectors(4);
>>> + Returns :  Array of strings
>>> + Args    :  (optional) floating
>>> +
>>> +=cut
>>> +
>>> +sub get_all_vectors {
>>> + my $self = shift;
>>> +    $self->throw_not_implemented();
>>> +}
>>>  1;
>>>
>>> _______________________________________________
>>> Bioperl-guts-l mailing list
>>> Bioperl-guts-l at portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l
>>>
>>>
>> -- 
>> Aaron J. Mackey, Ph.D.
>> Dept. of Biology, Goddard 212
>> University of Pennsylvania       email:  amackey at pcbi.upenn.edu
>> 415 S. University Avenue         office: 215-898-1205
>> Philadelphia, PA  19104-6017     fax:    215-746-6697
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
> -- 
> Stefan Kirov, Ph.D.
> University of Tennessee/Oak Ridge National Laboratory
> 1060 Commerce Park, Oak Ridge
> TN 37830-8026
> USA
> tel +865 576 5120
> fax +865 241 1965
> e-mail: skirov at utk.edu
> sao at ornl.gov
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Aaron J. Mackey, Ph.D.
Dept. of Biology, Goddard 212
University of Pennsylvania       email:  amackey at pcbi.upenn.edu
415 S. University Avenue         office: 215-898-1205
Philadelphia, PA  19104-6017     fax:    215-746-6697



More information about the Bioperl-l mailing list