[Bioperl-l] Bio::Restriction::Analysis.pm; 'sizes' method is broken

Chris Fields cjfields at illinois.edu
Sat Nov 27 05:15:46 UTC 2010


Marc,

I've entered this into bugzilla for tracking (the best place for this, BTW).  I can't promise when we'll get to it, but since there is a proposed fix it may be handled fairly soon.  Thanks for pointing this out!

chris

On Nov 26, 2010, at 10:53 PM, Marc Perry wrote:

> Hi,
> 
> I was following along in the Bioperl tutorial, and in the POD for this module and I discovered that the 'sizes' method was not working as advertised (in my hands).  For the input file I was using the complete genome of bacteriophage lambda as a plain vanilla fasta file.  Here is the script I used:
> 
> =+=+=+=
> #!/usr/bin/perl
> 
> use strict;
> use warnings;
> use Bio::SeqIO;
> use Bio::Restriction::Analysis;
> use Bio::Restriction::Enzyme;
> use Bio::Restriction::EnzymeCollection;
> 
> my $input = shift or die "No fasta file on command line.";
> 
> my $seqio_obj = Bio::SeqIO->new(-file   => $input,
>                                                -format => "fasta" );
> 
> my $seq = $seqio_obj->next_seq;
> 
> my $all_enz = Bio::Restriction::EnzymeCollection->new();
> 
> my $h3 = $all_enz->get_enzyme('HindIII');
> 
> my $ra = Bio::Restriction::Analysis->new(-seq => $seq);
> 
> print join "\n", $ra->sizes($h3), "\n\n";
> print join "\n", $ra->sizes($h3, 0, 1), "\n\n";
> print join "\n", $ra->sizes($h3, 1, 1), "\n\n";
> 
> exit;
> =+=+=+=
> 
> And here is the output:
> 
> 48502
> 
> 
> 48502
> 
> 
> 48.502
> 
> As I recall, there are six HindIII sites in lambda, which should yield 7 fragments from a linear genome.  Here is the subroutine code from github:
> 
> 
> sub sizes {
> 
>    my ($self, $enz, $kb, $sort) = @_;
> 
>    $self->throw('no enzyme selected to get fragments for')
> 
>        unless $enz;
> 
>    $self->cut unless $self->{'_cut'};
> 
>    my @frag; my $lastsite=0;
> 
>    foreach my $site (@{$self->{'_cut_positions'}->{$enz}}) { # BUG
> 
>      $kb ? push (@frag, (int($site-($lastsite))/100)/10)
> 
>          : push (@frag, $site-($lastsite));
> 
>      $lastsite=$site;
> 
>    }
> 
>    $kb ? push (@frag, (int($self->{'_seq'}->length-($lastsite))/100)/10)
> 
>        : push (@frag, $self->{'_seq'}->length-($lastsite));
> 
>    if ($self->{'_seq'}->is_circular) {
> 
>       my $first=shift @frag;
> 
>       my $last=pop @frag;
> 
>       push @frag, ($first+$last);
> 
>    }
> 
>    $sort ? @frag = sort {$b <=> $a} @frag : 1;
> 
> 
> 
>    return @frag;
> 
> }
> 
> 
> I eventually tracked the bug down to the indicated line.  As written, we are feeding in the enzyme object as a hash key instead of the enzyme's name, and everybody is unhappy.  Here is the solution that I hacked out:
> 
> my $name = $enz->{_seq}->{display_id};
> foreach my $site (@{$self->{'_cut_positions'}->{$name}}) {
> 
> And now my script yields this output:
> 
> 23130
> 2027
> 2322
> 9416
> 564
> 6682
> 4361
> 
> 
> 23130
> 9416
> 6682
> 4361
> 2322
> 2027
> 564
> 
> 
> 23.13
> 9.416
> 6.682
> 4.361
> 2.322
> 2.027
> 0.564
> 
> which makes me happier.  Oh, the example shown in the POD is also incorrect; using it like this throws an exception:
> 
> 
> You should be able to do these:
> 
> 
> 
>  # to see all the fragment sizes,
> 
>  print join "\n", @{$re->sizes($enz)}, "\n";
> 
>  # to see all the fragment sizes sorted
> 
>  print join "\n", @{$re->sizes($enz, 0, 1)}, "\n";
> 
>  # to see all the fragment sizes in kb sorted
> 
>  print join "\n", @{$re->sizes($enz, 1, 1)}, "\n";
> 
> --Marc
> 
> Marc Perry
> Scientific Associate
> 
> Ontario Institute for Cancer Research
> MaRS Centre, South Tower
> 101 College Street, Suite 800
> Toronto, Ontario, Canada M5G 0A3
> 
> Tel:       416-673-8593
> Toll-free:  1-866-678-6427
> Cell:      416-904-8037
> www.oicr.on.ca<http://www.oicr.on.ca/>
> 
> This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list