[Bioperl-l] Bio::Restriction::Analysis.pm; 'sizes' method is broken
    Marc Perry 
    Marc.Perry at oicr.on.ca
       
    Sat Nov 27 04:53:48 UTC 2010
    
    
  
Hi,
I was following along in the Bioperl tutorial, and in the POD for this module and I discovered that the 'sizes' method was not working as advertised (in my hands).  For the input file I was using the complete genome of bacteriophage lambda as a plain vanilla fasta file.  Here is the script I used:
=+=+=+=
#!/usr/bin/perl
use strict;
use warnings;
use Bio::SeqIO;
use Bio::Restriction::Analysis;
use Bio::Restriction::Enzyme;
use Bio::Restriction::EnzymeCollection;
my $input = shift or die "No fasta file on command line.";
my $seqio_obj = Bio::SeqIO->new(-file   => $input,
                                                -format => "fasta" );
my $seq = $seqio_obj->next_seq;
my $all_enz = Bio::Restriction::EnzymeCollection->new();
my $h3 = $all_enz->get_enzyme('HindIII');
my $ra = Bio::Restriction::Analysis->new(-seq => $seq);
print join "\n", $ra->sizes($h3), "\n\n";
print join "\n", $ra->sizes($h3, 0, 1), "\n\n";
print join "\n", $ra->sizes($h3, 1, 1), "\n\n";
exit;
=+=+=+=
And here is the output:
48502
48502
48.502
As I recall, there are six HindIII sites in lambda, which should yield 7 fragments from a linear genome.  Here is the subroutine code from github:
sub sizes {
    my ($self, $enz, $kb, $sort) = @_;
    $self->throw('no enzyme selected to get fragments for')
        unless $enz;
    $self->cut unless $self->{'_cut'};
    my @frag; my $lastsite=0;
    foreach my $site (@{$self->{'_cut_positions'}->{$enz}}) { # BUG
      $kb ? push (@frag, (int($site-($lastsite))/100)/10)
          : push (@frag, $site-($lastsite));
      $lastsite=$site;
    }
    $kb ? push (@frag, (int($self->{'_seq'}->length-($lastsite))/100)/10)
        : push (@frag, $self->{'_seq'}->length-($lastsite));
    if ($self->{'_seq'}->is_circular) {
       my $first=shift @frag;
       my $last=pop @frag;
       push @frag, ($first+$last);
    }
    $sort ? @frag = sort {$b <=> $a} @frag : 1;
    return @frag;
}
I eventually tracked the bug down to the indicated line.  As written, we are feeding in the enzyme object as a hash key instead of the enzyme's name, and everybody is unhappy.  Here is the solution that I hacked out:
my $name = $enz->{_seq}->{display_id};
foreach my $site (@{$self->{'_cut_positions'}->{$name}}) {
And now my script yields this output:
23130
2027
2322
9416
564
6682
4361
23130
9416
6682
4361
2322
2027
564
23.13
9.416
6.682
4.361
2.322
2.027
0.564
which makes me happier.  Oh, the example shown in the POD is also incorrect; using it like this throws an exception:
You should be able to do these:
  # to see all the fragment sizes,
  print join "\n", @{$re->sizes($enz)}, "\n";
  # to see all the fragment sizes sorted
  print join "\n", @{$re->sizes($enz, 0, 1)}, "\n";
  # to see all the fragment sizes in kb sorted
  print join "\n", @{$re->sizes($enz, 1, 1)}, "\n";
--Marc
Marc Perry
Scientific Associate
Ontario Institute for Cancer Research
MaRS Centre, South Tower
101 College Street, Suite 800
Toronto, Ontario, Canada M5G 0A3
Tel:       416-673-8593
Toll-free:  1-866-678-6427
Cell:      416-904-8037
www.oicr.on.ca<http://www.oicr.on.ca/>
This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all
    
    
More information about the Bioperl-l
mailing list