[Bioperl-l] Re: [Bioperl-guts-l] bioperl commit
Stefan Kirov
skirov at utk.edu
Thu May 27 18:18:56 EDT 2004
Aaron J. Mackey wrote:
>
> OK, so there's nothing wrong with me asking for a threshold less than
> 0.3, it just means that I may get lots of N's, but maybe not (depends
> on the profile). I don't think those limits should be hard coded
> (rather, they should be limits of 0 and 1.0). 0.0 gives me all N's,
> while 1.0 gives me undef (though 0.99 could give me a sequence back
> if all the positions have one residue with p > 0.99). -0.05 or 1.2
> or 10, etc. should throw.
>
Done.
> What I meant about using Bio::Tools::IUPAC is that from the threshold
> and prob. matrix, you should be able to generate a single nucleotide
> sequence with IUPAC ambiguity codes (i.e. "N", "R", "Y", etc).
> Perhaps that's all you should do, and let the user then do what they
> want with it (perhaps give it to Bio::Tools::IUPAC to get the seq
> stream, but perhaps not). In which case, you'd rename this to
> get_consensus() or somesuch.
>
Yes, I got this, there is $matrix->IUPAC method that will do that.
However, I find the generation of set of sequences through a threshold
less confusing than playing with the same threshold to generate less or
more relaxed IUPAC consensus. I guess there is place for both
approaches, but this one more convinient for me.
Stefan
> -Aaron
>
> On May 27, 2004, at 5:25 PM, Stefan Kirov wrote:
>
>> Hi Aaron,
>> Yes, it might be a bit weird. Bio::Matrix::PSM is nucleotide
>> specific, but you are right that the method is essentially the same
>> as the seq stream, provided by Bio::Tools::IUPAC, except for the
>> threshold. And yes, maybe it makes sense to make the threshold 0.3
>> instead of 3. It's just the way I have used it so far. It is not
>> hard to change it. Range for the thresholds: I guess this is more
>> nucleotide specific stuff- a frequency threshold under 0.3 is likely
>> to get almost everything accepted (so you get N), where seq
>> threshold above 0.7 is likely to give you a sequence equal to what
>> Bio::Matrix::PSM::consensus will give you. Why not use the
>> Bio::Tools::IUPAC- because of the threshold. I guess I could
>> generate several IUPAC consensus sequences and do then the
>> Bio::Tools::IUPAC::next_seq, but this is much more straightforward.
>> On the other hand maybe Bio::Tools::IUPAC could be extended to
>> accept threshold?
>> Stefan
>>
>> Aaron J. Mackey wrote:
>>
>>>
>>> Hi Stefan,
>>>
>>> I have a few questions about this latest commit; I'm sure it does
>>> what you need it to do, but it's a little "crufty".
>>>
>>> What does this mean, why would you provide a probability threshold
>>> in whole integers, and why are values outside of 3 and 7 illegal?
>>> Is Bio::Matrix::PSM nucleotide specific? Why wouldn't this
>>> "get_all_vectors" method be useful for any PSM? Why not use
>>> Bio::Tools::IUPAC to generate a sequence stream from a calculated
>>> consensus sequence?
>>>
>>> -Aaron
>>>
>>> On May 27, 2004, at 2:37 PM, Stefan Kirov wrote:
>>>
>>>>
>>>> skirov
>>>> Thu May 27 14:37:54 EDT 2004
>>>> Update of /home/repository/bioperl/bioperl-live/Bio/Matrix/PSM
>>>> In directory pub.open-bio.org:/tmp/cvs-serv8640
>>>>
>>>> Modified Files:
>>>> SiteMatrix.pm SiteMatrixI.pm
>>>> Log Message:
>>>> method added: get_all_vectors, all possible seq to satisfy the
>>>> PFM under a give threshold
>>>>
>>>> bioperl-live/Bio/Matrix/PSM SiteMatrix.pm,1.15,1.16
>>>> SiteMatrixI.pm,1.7,1.8
>>>> ===================================================================
>>>> RCS file:
>>>> /home/repository/bioperl/bioperl-live/Bio/Matrix/PSM/SiteMatrix.pm,v
>>>> retrieving revision 1.15
>>>> retrieving revision 1.16
>>>> diff -u -r1.15 -r1.16
>>>> ---
>>>> /home/repository/bioperl/bioperl-live/Bio/Matrix/PSM/SiteMatrix.pm
>>>> 2004/05/12 18:27:30 1.15
>>>> +++
>>>> /home/repository/bioperl/bioperl-live/Bio/Matrix/PSM/SiteMatrix.pm
>>>> 2004/05/27 18:37:54 1.16
>>>> @@ -883,4 +883,48 @@
>>>> return $score;
>>>> }
>>>>
>>>> +
>>>> +=head2 get_all_vectors
>>>> +
>>>> + Title : get_all_vectors
>>>> + Usage :
>>>> + Function: returns all possible sequence vectors to satisfy the
>>>> PFM under
>>>> + a given threshold
>>>> + Throws : If threshold outside of 3..7 (no sense to do that)
>>>> + Example : my @vectors=$self->get_all_vectors(4);
>>>> + Returns : Array of strings
>>>> + Args : (optional) floating
>>>> +
>>>> +=cut
>>>> +
>>>> +sub get_all_vectors {
>>>> + my $self=shift;
>>>> + my $thresh=shift;
>>>> + $self->throw("Out of range. Threshold should be >3 and 7<.\n")
>>>> if (($thresh<3) || ($thresh>7));
>>>> + my @seq=split(//,$self->consensus($thresh));
>>>> + my @perm;
>>>> + $thresh=$thresh/10;
>>>> + for my $i (0..@{$self->{probA}}) {
>>>> + push @{$perm[$i]},'A' if ($self->{probA}->[$i]>$thresh);
>>>> + push @{$perm[$i]},'C' if ($self->{probC}->[$i]>$thresh);
>>>> + push @{$perm[$i]},'G' if ($self->{probG}->[$i]>$thresh);
>>>> + push @{$perm[$i]},'T' if ($self->{probT}->[$i]>$thresh);
>>>> + push @{$perm[$i]},'N' if ($seq[$i] eq 'N');
>>>> + }
>>>> + my $fpos=shift @perm;
>>>> + my @strings=@$fpos;
>>>> + foreach my $pos (@perm) {
>>>> + my @newstr;
>>>> + foreach my $let (@$pos) {
>>>> + foreach my $string (@strings) {
>>>> + my $newstring = $string . $let;
>>>> + push @newstr,$newstring;
>>>> + }
>>>> + }
>>>> + @strings=@newstr;
>>>> + }
>>>> + return @strings;
>>>> +}
>>>> +
>>>> +
>>>> 1;
>>>>
>>>> ===================================================================
>>>> RCS file: /home/repository/bioperl/bioperl-live/Bio/Matrix/PSM/
>>>> SiteMatrixI.pm,v
>>>> retrieving revision 1.7
>>>> retrieving revision 1.8
>>>> diff -u -r1.7 -r1.8
>>>> ---
>>>> /home/repository/bioperl/bioperl-live/Bio/Matrix/PSM/SiteMatrixI.pm
>>>> 2004/05/12 18:27:30 1.7
>>>> +++
>>>> /home/repository/bioperl/bioperl-live/Bio/Matrix/PSM/SiteMatrixI.pm
>>>> 2004/05/27 18:37:54 1.8
>>>> @@ -572,5 +572,21 @@
>>>> $self->throw_not_implemented();
>>>> }
>>>>
>>>> +=head2 get_all_vectors
>>>>
>>>> + Title : get_all_vectors
>>>> + Usage :
>>>> + Function: returns all possible sequence vectors to satisfy the
>>>> PFM under
>>>> + a given threshold
>>>> + Throws : If threshold outside of 3..7 (no sense to do that)
>>>> + Example : my @vectors=$self->get_all_vectors(4);
>>>> + Returns : Array of strings
>>>> + Args : (optional) floating
>>>> +
>>>> +=cut
>>>> +
>>>> +sub get_all_vectors {
>>>> + my $self = shift;
>>>> + $self->throw_not_implemented();
>>>> +}
>>>> 1;
>>>>
>>>> _______________________________________________
>>>> Bioperl-guts-l mailing list
>>>> Bioperl-guts-l at portal.open-bio.org
>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l
>>>>
>>>>
>>> --
>>> Aaron J. Mackey, Ph.D.
>>> Dept. of Biology, Goddard 212
>>> University of Pennsylvania email: amackey at pcbi.upenn.edu
>>> 415 S. University Avenue office: 215-898-1205
>>> Philadelphia, PA 19104-6017 fax: 215-746-6697
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>
>> --
>> Stefan Kirov, Ph.D.
>> University of Tennessee/Oak Ridge National Laboratory
>> 1060 Commerce Park, Oak Ridge
>> TN 37830-8026
>> USA
>> tel +865 576 5120
>> fax +865 241 1965
>> e-mail: skirov at utk.edu
>> sao at ornl.gov
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
> --
> Aaron J. Mackey, Ph.D.
> Dept. of Biology, Goddard 212
> University of Pennsylvania email: amackey at pcbi.upenn.edu
> 415 S. University Avenue office: 215-898-1205
> Philadelphia, PA 19104-6017 fax: 215-746-6697
>
--
Stefan Kirov, Ph.D.
University of Tennessee/Oak Ridge National Laboratory
1060 Commerce Park, Oak Ridge
TN 37830-8026
USA
tel +865 576 5120
fax +865 241 1965
e-mail: skirov at utk.edu
sao at ornl.gov
More information about the Bioperl-l
mailing list