[Bioperl-l] Bio::Restriction::Analysis.pm; 'sizes' method is broken

Marc Perry Marc.Perry at oicr.on.ca
Sat Nov 27 04:53:48 UTC 2010


Hi,

I was following along in the Bioperl tutorial, and in the POD for this module and I discovered that the 'sizes' method was not working as advertised (in my hands).  For the input file I was using the complete genome of bacteriophage lambda as a plain vanilla fasta file.  Here is the script I used:

=+=+=+=
#!/usr/bin/perl

use strict;
use warnings;
use Bio::SeqIO;
use Bio::Restriction::Analysis;
use Bio::Restriction::Enzyme;
use Bio::Restriction::EnzymeCollection;

my $input = shift or die "No fasta file on command line.";

my $seqio_obj = Bio::SeqIO->new(-file   => $input,
                                                -format => "fasta" );

my $seq = $seqio_obj->next_seq;

my $all_enz = Bio::Restriction::EnzymeCollection->new();

my $h3 = $all_enz->get_enzyme('HindIII');

my $ra = Bio::Restriction::Analysis->new(-seq => $seq);

print join "\n", $ra->sizes($h3), "\n\n";
print join "\n", $ra->sizes($h3, 0, 1), "\n\n";
print join "\n", $ra->sizes($h3, 1, 1), "\n\n";

exit;
=+=+=+=

And here is the output:

48502


48502


48.502

As I recall, there are six HindIII sites in lambda, which should yield 7 fragments from a linear genome.  Here is the subroutine code from github:


sub sizes {

    my ($self, $enz, $kb, $sort) = @_;

    $self->throw('no enzyme selected to get fragments for')

        unless $enz;

    $self->cut unless $self->{'_cut'};

    my @frag; my $lastsite=0;

    foreach my $site (@{$self->{'_cut_positions'}->{$enz}}) { # BUG

      $kb ? push (@frag, (int($site-($lastsite))/100)/10)

          : push (@frag, $site-($lastsite));

      $lastsite=$site;

    }

    $kb ? push (@frag, (int($self->{'_seq'}->length-($lastsite))/100)/10)

        : push (@frag, $self->{'_seq'}->length-($lastsite));

    if ($self->{'_seq'}->is_circular) {

       my $first=shift @frag;

       my $last=pop @frag;

       push @frag, ($first+$last);

    }

    $sort ? @frag = sort {$b <=> $a} @frag : 1;



    return @frag;

}


I eventually tracked the bug down to the indicated line.  As written, we are feeding in the enzyme object as a hash key instead of the enzyme's name, and everybody is unhappy.  Here is the solution that I hacked out:

my $name = $enz->{_seq}->{display_id};
foreach my $site (@{$self->{'_cut_positions'}->{$name}}) {

And now my script yields this output:

23130
2027
2322
9416
564
6682
4361


23130
9416
6682
4361
2322
2027
564


23.13
9.416
6.682
4.361
2.322
2.027
0.564

which makes me happier.  Oh, the example shown in the POD is also incorrect; using it like this throws an exception:


You should be able to do these:



  # to see all the fragment sizes,

  print join "\n", @{$re->sizes($enz)}, "\n";

  # to see all the fragment sizes sorted

  print join "\n", @{$re->sizes($enz, 0, 1)}, "\n";

  # to see all the fragment sizes in kb sorted

  print join "\n", @{$re->sizes($enz, 1, 1)}, "\n";

--Marc

Marc Perry
Scientific Associate

Ontario Institute for Cancer Research
MaRS Centre, South Tower
101 College Street, Suite 800
Toronto, Ontario, Canada M5G 0A3

Tel:       416-673-8593
Toll-free:  1-866-678-6427
Cell:      416-904-8037
www.oicr.on.ca<http://www.oicr.on.ca/>

This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all





More information about the Bioperl-l mailing list