[Bioperl-l] Re: [Bioperl-guts-l] bioperl commit
Aaron J. Mackey
amackey at pcbi.upenn.edu
Thu May 27 18:06:45 EDT 2004
OK, so there's nothing wrong with me asking for a threshold less than
0.3, it just means that I may get lots of N's, but maybe not (depends
on the profile). I don't think those limits should be hard coded
(rather, they should be limits of 0 and 1.0). 0.0 gives me all N's,
while 1.0 gives me undef (though 0.99 could give me a sequence back if
all the positions have one residue with p > 0.99). -0.05 or 1.2 or 10,
etc. should throw.
What I meant about using Bio::Tools::IUPAC is that from the threshold
and prob. matrix, you should be able to generate a single nucleotide
sequence with IUPAC ambiguity codes (i.e. "N", "R", "Y", etc). Perhaps
that's all you should do, and let the user then do what they want with
it (perhaps give it to Bio::Tools::IUPAC to get the seq stream, but
perhaps not). In which case, you'd rename this to get_consensus() or
somesuch.
-Aaron
On May 27, 2004, at 5:25 PM, Stefan Kirov wrote:
> Hi Aaron,
> Yes, it might be a bit weird. Bio::Matrix::PSM is nucleotide specific,
> but you are right that the method is essentially the same as the seq
> stream, provided by Bio::Tools::IUPAC, except for the threshold. And
> yes, maybe it makes sense to make the threshold 0.3 instead of 3. It's
> just the way I have used it so far. It is not hard to change it. Range
> for the thresholds: I guess this is more nucleotide specific stuff- a
> frequency threshold under 0.3 is likely to get almost everything
> accepted (so you get N), where seq threshold above 0.7 is likely to
> give you a sequence equal to what Bio::Matrix::PSM::consensus will
> give you. Why not use the Bio::Tools::IUPAC- because of the threshold.
> I guess I could generate several IUPAC consensus sequences and do then
> the Bio::Tools::IUPAC::next_seq, but this is much more
> straightforward. On the other hand maybe Bio::Tools::IUPAC could be
> extended to accept threshold?
> Stefan
>
> Aaron J. Mackey wrote:
>
>>
>> Hi Stefan,
>>
>> I have a few questions about this latest commit; I'm sure it does
>> what you need it to do, but it's a little "crufty".
>>
>> What does this mean, why would you provide a probability threshold in
>> whole integers, and why are values outside of 3 and 7 illegal? Is
>> Bio::Matrix::PSM nucleotide specific? Why wouldn't this
>> "get_all_vectors" method be useful for any PSM? Why not use
>> Bio::Tools::IUPAC to generate a sequence stream from a calculated
>> consensus sequence?
>>
>> -Aaron
>>
>> On May 27, 2004, at 2:37 PM, Stefan Kirov wrote:
>>
>>>
>>> skirov
>>> Thu May 27 14:37:54 EDT 2004
>>> Update of /home/repository/bioperl/bioperl-live/Bio/Matrix/PSM
>>> In directory pub.open-bio.org:/tmp/cvs-serv8640
>>>
>>> Modified Files:
>>> SiteMatrix.pm SiteMatrixI.pm
>>> Log Message:
>>> method added: get_all_vectors, all possible seq to satisfy the PFM
>>> under a give threshold
>>>
>>> bioperl-live/Bio/Matrix/PSM SiteMatrix.pm,1.15,1.16
>>> SiteMatrixI.pm,1.7,1.8
>>> ===================================================================
>>> RCS file:
>>> /home/repository/bioperl/bioperl-live/Bio/Matrix/PSM/SiteMatrix.pm,v
>>> retrieving revision 1.15
>>> retrieving revision 1.16
>>> diff -u -r1.15 -r1.16
>>> ---
>>> /home/repository/bioperl/bioperl-live/Bio/Matrix/PSM/SiteMatrix.pm
>>> 2004/05/12 18:27:30 1.15
>>> +++
>>> /home/repository/bioperl/bioperl-live/Bio/Matrix/PSM/SiteMatrix.pm
>>> 2004/05/27 18:37:54 1.16
>>> @@ -883,4 +883,48 @@
>>> return $score;
>>> }
>>>
>>> +
>>> +=head2 get_all_vectors
>>> +
>>> + Title : get_all_vectors
>>> + Usage :
>>> + Function: returns all possible sequence vectors to satisfy the
>>> PFM under
>>> + a given threshold
>>> + Throws : If threshold outside of 3..7 (no sense to do that)
>>> + Example : my @vectors=$self->get_all_vectors(4);
>>> + Returns : Array of strings
>>> + Args : (optional) floating
>>> +
>>> +=cut
>>> +
>>> +sub get_all_vectors {
>>> + my $self=shift;
>>> + my $thresh=shift;
>>> + $self->throw("Out of range. Threshold should be >3 and 7<.\n") if
>>> (($thresh<3) || ($thresh>7));
>>> + my @seq=split(//,$self->consensus($thresh));
>>> + my @perm;
>>> + $thresh=$thresh/10;
>>> + for my $i (0..@{$self->{probA}}) {
>>> + push @{$perm[$i]},'A' if ($self->{probA}->[$i]>$thresh);
>>> + push @{$perm[$i]},'C' if ($self->{probC}->[$i]>$thresh);
>>> + push @{$perm[$i]},'G' if ($self->{probG}->[$i]>$thresh);
>>> + push @{$perm[$i]},'T' if ($self->{probT}->[$i]>$thresh);
>>> + push @{$perm[$i]},'N' if ($seq[$i] eq 'N');
>>> + }
>>> + my $fpos=shift @perm;
>>> + my @strings=@$fpos;
>>> + foreach my $pos (@perm) {
>>> + my @newstr;
>>> + foreach my $let (@$pos) {
>>> + foreach my $string (@strings) {
>>> + my $newstring = $string . $let;
>>> + push @newstr,$newstring;
>>> + }
>>> + }
>>> + @strings=@newstr;
>>> + }
>>> + return @strings;
>>> +}
>>> +
>>> +
>>> 1;
>>>
>>> ===================================================================
>>> RCS file:
>>> /home/repository/bioperl/bioperl-live/Bio/Matrix/PSM/
>>> SiteMatrixI.pm,v
>>> retrieving revision 1.7
>>> retrieving revision 1.8
>>> diff -u -r1.7 -r1.8
>>> ---
>>> /home/repository/bioperl/bioperl-live/Bio/Matrix/PSM/SiteMatrixI.pm
>>> 2004/05/12 18:27:30 1.7
>>> +++
>>> /home/repository/bioperl/bioperl-live/Bio/Matrix/PSM/SiteMatrixI.pm
>>> 2004/05/27 18:37:54 1.8
>>> @@ -572,5 +572,21 @@
>>> $self->throw_not_implemented();
>>> }
>>>
>>> +=head2 get_all_vectors
>>>
>>> + Title : get_all_vectors
>>> + Usage :
>>> + Function: returns all possible sequence vectors to satisfy the
>>> PFM under
>>> + a given threshold
>>> + Throws : If threshold outside of 3..7 (no sense to do that)
>>> + Example : my @vectors=$self->get_all_vectors(4);
>>> + Returns : Array of strings
>>> + Args : (optional) floating
>>> +
>>> +=cut
>>> +
>>> +sub get_all_vectors {
>>> + my $self = shift;
>>> + $self->throw_not_implemented();
>>> +}
>>> 1;
>>>
>>> _______________________________________________
>>> Bioperl-guts-l mailing list
>>> Bioperl-guts-l at portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l
>>>
>>>
>> --
>> Aaron J. Mackey, Ph.D.
>> Dept. of Biology, Goddard 212
>> University of Pennsylvania email: amackey at pcbi.upenn.edu
>> 415 S. University Avenue office: 215-898-1205
>> Philadelphia, PA 19104-6017 fax: 215-746-6697
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
> --
> Stefan Kirov, Ph.D.
> University of Tennessee/Oak Ridge National Laboratory
> 1060 Commerce Park, Oak Ridge
> TN 37830-8026
> USA
> tel +865 576 5120
> fax +865 241 1965
> e-mail: skirov at utk.edu
> sao at ornl.gov
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Aaron J. Mackey, Ph.D.
Dept. of Biology, Goddard 212
University of Pennsylvania email: amackey at pcbi.upenn.edu
415 S. University Avenue office: 215-898-1205
Philadelphia, PA 19104-6017 fax: 215-746-6697
More information about the Bioperl-l
mailing list