[Bioperl-l] Re: [Bioperl-guts-l] bioperl commit

Stefan Kirov skirov at utk.edu
Thu May 27 17:25:14 EDT 2004


Hi Aaron,
Yes, it might be a bit weird. Bio::Matrix::PSM is nucleotide specific, 
but you are right that the method is essentially the same as the seq 
stream, provided by Bio::Tools::IUPAC, except for the threshold. And 
yes, maybe it makes sense to make the threshold 0.3 instead of 3. It's 
just the way I have used it so far. It is not hard to change it. Range 
for the thresholds: I guess this is more nucleotide specific stuff- a 
frequency threshold under 0.3 is likely to get almost everything 
accepted (so you get N), where seq threshold above  0.7 is likely to 
give you a sequence equal to what Bio::Matrix::PSM::consensus will give 
you. Why not use the Bio::Tools::IUPAC- because of the threshold. I 
guess I could generate several IUPAC consensus sequences and do then the 
Bio::Tools::IUPAC::next_seq, but this is much more straightforward. On 
the other hand maybe Bio::Tools::IUPAC could be extended to accept 
threshold?
Stefan

Aaron J. Mackey wrote:

>
> Hi Stefan,
>
> I have a few questions about this latest commit; I'm sure it does 
> what  you need it to do, but it's a little "crufty".
>
> What does this mean, why would you provide a probability threshold in  
> whole integers, and why are values outside of 3 and 7 illegal?  Is  
> Bio::Matrix::PSM nucleotide specific?  Why wouldn't this  
> "get_all_vectors" method be useful for any PSM?  Why not use  
> Bio::Tools::IUPAC to generate a sequence stream from a calculated  
> consensus sequence?
>
> -Aaron
>
> On May 27, 2004, at 2:37 PM, Stefan Kirov wrote:
>
>>
>> skirov
>> Thu May 27 14:37:54 EDT 2004
>> Update of /home/repository/bioperl/bioperl-live/Bio/Matrix/PSM
>> In directory pub.open-bio.org:/tmp/cvs-serv8640
>>
>> Modified Files:
>>     SiteMatrix.pm SiteMatrixI.pm
>> Log Message:
>> method added: get_all_vectors, all possible seq to satisfy the PFM  
>> under a give threshold
>>
>> bioperl-live/Bio/Matrix/PSM SiteMatrix.pm,1.15,1.16  
>> SiteMatrixI.pm,1.7,1.8
>> ===================================================================
>> RCS file:  
>> /home/repository/bioperl/bioperl-live/Bio/Matrix/PSM/SiteMatrix.pm,v
>> retrieving revision 1.15
>> retrieving revision 1.16
>> diff -u -r1.15 -r1.16
>> ---  
>> /home/repository/bioperl/bioperl-live/Bio/Matrix/PSM/SiteMatrix.pm     
>> 2004/05/12 18:27:30    1.15
>> +++  
>> /home/repository/bioperl/bioperl-live/Bio/Matrix/PSM/SiteMatrix.pm     
>> 2004/05/27 18:37:54    1.16
>> @@ -883,4 +883,48 @@
>>  return $score;
>>  }
>>
>> +
>> +=head2 get_all_vectors
>> +
>> + Title   : get_all_vectors
>> + Usage   :
>> + Function:  returns all possible sequence vectors to satisfy the 
>> PFM  under
>> +            a given threshold
>> + Throws  :  If threshold outside of 3..7 (no sense to do that)
>> + Example :  my @vectors=$self->get_all_vectors(4);
>> + Returns :  Array of strings
>> + Args    :  (optional) floating
>> +
>> +=cut
>> +
>> +sub get_all_vectors {
>> +    my $self=shift;
>> +    my $thresh=shift;
>> +  $self->throw("Out of range. Threshold should be >3 and 7<.\n") if  
>> (($thresh<3) || ($thresh>7));
>> +  my @seq=split(//,$self->consensus($thresh));
>> +  my @perm;
>> +  $thresh=$thresh/10;
>> +  for my $i (0..@{$self->{probA}}) {
>> +    push @{$perm[$i]},'A' if ($self->{probA}->[$i]>$thresh);
>> +    push @{$perm[$i]},'C' if ($self->{probC}->[$i]>$thresh);
>> +    push @{$perm[$i]},'G' if ($self->{probG}->[$i]>$thresh);
>> +    push @{$perm[$i]},'T' if ($self->{probT}->[$i]>$thresh);
>> +    push @{$perm[$i]},'N' if  ($seq[$i] eq 'N');
>> +  }
>> +  my $fpos=shift @perm;
>> +  my @strings=@$fpos;
>> +  foreach my $pos (@perm) {
>> +    my @newstr;
>> +    foreach my $let (@$pos) {
>> +      foreach my $string (@strings) {
>> +        my $newstring = $string . $let;
>> +        push @newstr,$newstring;
>> +      }
>> +    }
>> +    @strings=@newstr;
>> +  }
>> +    return @strings;
>> +}
>> +
>> +
>>  1;
>>
>> ===================================================================
>> RCS file:  
>> /home/repository/bioperl/bioperl-live/Bio/Matrix/PSM/SiteMatrixI.pm,v
>> retrieving revision 1.7
>> retrieving revision 1.8
>> diff -u -r1.7 -r1.8
>> ---  
>> /home/repository/bioperl/bioperl-live/Bio/Matrix/PSM/SiteMatrixI.pm     
>> 2004/05/12 18:27:30    1.7
>> +++  
>> /home/repository/bioperl/bioperl-live/Bio/Matrix/PSM/SiteMatrixI.pm     
>> 2004/05/27 18:37:54    1.8
>> @@ -572,5 +572,21 @@
>>      $self->throw_not_implemented();
>>  }
>>
>> +=head2 get_all_vectors
>>
>> + Title   : get_all_vectors
>> + Usage   :
>> + Function:  returns all possible sequence vectors to satisfy the 
>> PFM  under
>> +            a given threshold
>> + Throws  :  If threshold outside of 3..7 (no sense to do that)
>> + Example :  my @vectors=$self->get_all_vectors(4);
>> + Returns :  Array of strings
>> + Args    :  (optional) floating
>> +
>> +=cut
>> +
>> +sub get_all_vectors {
>> + my $self = shift;
>> +    $self->throw_not_implemented();
>> +}
>>  1;
>>
>> _______________________________________________
>> Bioperl-guts-l mailing list
>> Bioperl-guts-l at portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l
>>
>>
> -- 
> Aaron J. Mackey, Ph.D.
> Dept. of Biology, Goddard 212
> University of Pennsylvania       email:  amackey at pcbi.upenn.edu
> 415 S. University Avenue         office: 215-898-1205
> Philadelphia, PA  19104-6017     fax:    215-746-6697
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l


-- 
Stefan Kirov, Ph.D.
University of Tennessee/Oak Ridge National Laboratory
1060 Commerce Park, Oak Ridge
TN 37830-8026
USA
tel +865 576 5120
fax +865 241 1965
e-mail: skirov at utk.edu
sao at ornl.gov



More information about the Bioperl-l mailing list