[Bioperl-l] Re: [Bioperl-guts-l] bioperl commit
Stefan Kirov
skirov at utk.edu
Thu May 27 17:25:14 EDT 2004
Hi Aaron,
Yes, it might be a bit weird. Bio::Matrix::PSM is nucleotide specific,
but you are right that the method is essentially the same as the seq
stream, provided by Bio::Tools::IUPAC, except for the threshold. And
yes, maybe it makes sense to make the threshold 0.3 instead of 3. It's
just the way I have used it so far. It is not hard to change it. Range
for the thresholds: I guess this is more nucleotide specific stuff- a
frequency threshold under 0.3 is likely to get almost everything
accepted (so you get N), where seq threshold above 0.7 is likely to
give you a sequence equal to what Bio::Matrix::PSM::consensus will give
you. Why not use the Bio::Tools::IUPAC- because of the threshold. I
guess I could generate several IUPAC consensus sequences and do then the
Bio::Tools::IUPAC::next_seq, but this is much more straightforward. On
the other hand maybe Bio::Tools::IUPAC could be extended to accept
threshold?
Stefan
Aaron J. Mackey wrote:
>
> Hi Stefan,
>
> I have a few questions about this latest commit; I'm sure it does
> what you need it to do, but it's a little "crufty".
>
> What does this mean, why would you provide a probability threshold in
> whole integers, and why are values outside of 3 and 7 illegal? Is
> Bio::Matrix::PSM nucleotide specific? Why wouldn't this
> "get_all_vectors" method be useful for any PSM? Why not use
> Bio::Tools::IUPAC to generate a sequence stream from a calculated
> consensus sequence?
>
> -Aaron
>
> On May 27, 2004, at 2:37 PM, Stefan Kirov wrote:
>
>>
>> skirov
>> Thu May 27 14:37:54 EDT 2004
>> Update of /home/repository/bioperl/bioperl-live/Bio/Matrix/PSM
>> In directory pub.open-bio.org:/tmp/cvs-serv8640
>>
>> Modified Files:
>> SiteMatrix.pm SiteMatrixI.pm
>> Log Message:
>> method added: get_all_vectors, all possible seq to satisfy the PFM
>> under a give threshold
>>
>> bioperl-live/Bio/Matrix/PSM SiteMatrix.pm,1.15,1.16
>> SiteMatrixI.pm,1.7,1.8
>> ===================================================================
>> RCS file:
>> /home/repository/bioperl/bioperl-live/Bio/Matrix/PSM/SiteMatrix.pm,v
>> retrieving revision 1.15
>> retrieving revision 1.16
>> diff -u -r1.15 -r1.16
>> ---
>> /home/repository/bioperl/bioperl-live/Bio/Matrix/PSM/SiteMatrix.pm
>> 2004/05/12 18:27:30 1.15
>> +++
>> /home/repository/bioperl/bioperl-live/Bio/Matrix/PSM/SiteMatrix.pm
>> 2004/05/27 18:37:54 1.16
>> @@ -883,4 +883,48 @@
>> return $score;
>> }
>>
>> +
>> +=head2 get_all_vectors
>> +
>> + Title : get_all_vectors
>> + Usage :
>> + Function: returns all possible sequence vectors to satisfy the
>> PFM under
>> + a given threshold
>> + Throws : If threshold outside of 3..7 (no sense to do that)
>> + Example : my @vectors=$self->get_all_vectors(4);
>> + Returns : Array of strings
>> + Args : (optional) floating
>> +
>> +=cut
>> +
>> +sub get_all_vectors {
>> + my $self=shift;
>> + my $thresh=shift;
>> + $self->throw("Out of range. Threshold should be >3 and 7<.\n") if
>> (($thresh<3) || ($thresh>7));
>> + my @seq=split(//,$self->consensus($thresh));
>> + my @perm;
>> + $thresh=$thresh/10;
>> + for my $i (0..@{$self->{probA}}) {
>> + push @{$perm[$i]},'A' if ($self->{probA}->[$i]>$thresh);
>> + push @{$perm[$i]},'C' if ($self->{probC}->[$i]>$thresh);
>> + push @{$perm[$i]},'G' if ($self->{probG}->[$i]>$thresh);
>> + push @{$perm[$i]},'T' if ($self->{probT}->[$i]>$thresh);
>> + push @{$perm[$i]},'N' if ($seq[$i] eq 'N');
>> + }
>> + my $fpos=shift @perm;
>> + my @strings=@$fpos;
>> + foreach my $pos (@perm) {
>> + my @newstr;
>> + foreach my $let (@$pos) {
>> + foreach my $string (@strings) {
>> + my $newstring = $string . $let;
>> + push @newstr,$newstring;
>> + }
>> + }
>> + @strings=@newstr;
>> + }
>> + return @strings;
>> +}
>> +
>> +
>> 1;
>>
>> ===================================================================
>> RCS file:
>> /home/repository/bioperl/bioperl-live/Bio/Matrix/PSM/SiteMatrixI.pm,v
>> retrieving revision 1.7
>> retrieving revision 1.8
>> diff -u -r1.7 -r1.8
>> ---
>> /home/repository/bioperl/bioperl-live/Bio/Matrix/PSM/SiteMatrixI.pm
>> 2004/05/12 18:27:30 1.7
>> +++
>> /home/repository/bioperl/bioperl-live/Bio/Matrix/PSM/SiteMatrixI.pm
>> 2004/05/27 18:37:54 1.8
>> @@ -572,5 +572,21 @@
>> $self->throw_not_implemented();
>> }
>>
>> +=head2 get_all_vectors
>>
>> + Title : get_all_vectors
>> + Usage :
>> + Function: returns all possible sequence vectors to satisfy the
>> PFM under
>> + a given threshold
>> + Throws : If threshold outside of 3..7 (no sense to do that)
>> + Example : my @vectors=$self->get_all_vectors(4);
>> + Returns : Array of strings
>> + Args : (optional) floating
>> +
>> +=cut
>> +
>> +sub get_all_vectors {
>> + my $self = shift;
>> + $self->throw_not_implemented();
>> +}
>> 1;
>>
>> _______________________________________________
>> Bioperl-guts-l mailing list
>> Bioperl-guts-l at portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l
>>
>>
> --
> Aaron J. Mackey, Ph.D.
> Dept. of Biology, Goddard 212
> University of Pennsylvania email: amackey at pcbi.upenn.edu
> 415 S. University Avenue office: 215-898-1205
> Philadelphia, PA 19104-6017 fax: 215-746-6697
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
--
Stefan Kirov, Ph.D.
University of Tennessee/Oak Ridge National Laboratory
1060 Commerce Park, Oak Ridge
TN 37830-8026
USA
tel +865 576 5120
fax +865 241 1965
e-mail: skirov at utk.edu
sao at ornl.gov
More information about the Bioperl-l
mailing list