[Bioperl-l] A better fix for Species.pm memory leak, please REVIEW.
George Hartzell
hartzell at alerce.com
Sun Nov 23 00:21:19 UTC 2008
I dug back into the Species.pm memory leak and my less-than-ideal fix
for it.
First, in my defense, my previous fix only causes failures in
t/Species.t if network tests are enabled. It passes the generic
ones. I *thought* I'd run the tests before I committed :).
The basic cause of the problem is that when species() calls
Bio::Tree::Tree->new(-node=> $species_taxon), $species_taxon is a copy
of $self. The tree ends up adding that node onto the classification
list and it gets linked it into the descendent links, which results in
a cycle.
If that lousy english description left you scratching your head,
here's a hacked-to-pieces version of R Voss's original test case.
You'll need Devel::Cycle (YAY LINCOLN!) to run it. Find yourself a
genbank file (I used gbpri21.seq, from the original bug report), put
it into a directory somewhere and do something like
bug.pl GBDIR /tmp
(yeah, why do a while loop w/ an exit in the middle? Cut-n-paste.
Why else...?)
################################################################
#!/usr/bin/perl
use strict;
use warnings;
use Bio::SeqIO;
use Devel::Cycle;
my $dir = shift @ARGV; # the directory with *.gz files
my $out = shift @ARGV; # the directory to write to...
mkdir $out if not -d $out; # ...which may need to be created
opendir my $dirhandle, $dir or die $!;
for my $file ( readdir $dirhandle ) {
next if $file !~ /^gb.*/;
# object that parses genbank files,
# returns Bio::Seq objects
my $reader = Bio::SeqIO->new(
'-format' => 'genbank',
'-file' => "${dir}/${file}"
);
while ( my $seq = $reader->next_seq ) {
my $name = $seq->species->binomial;
find_cycle($seq);
exit 1;
}
# delete the extracted, unfiltered file
unlink "${dir}/${file}";
}
################################################################
Anyway. Here's a fix. Instead of adding $self to the tree, make a
new species node w/ $self's important bits and add that instead.
There's a patch below that gets rid of my weaken and Sendu's
fix-for-the-fix and then Does The Right Thing (I hope).
It seems to pass all of the Species.t tests.
The take home exam questions are: Are there any other missing
important bits?
Thoughts?
I'll commit it if someone'll review it.
g.
Index: Bio/Species.pm
===================================================================
--- Bio/Species.pm (revision 15008)
+++ Bio/Species.pm (working copy)
@@ -261,8 +261,13 @@
# work it out from our nodes
my $species_taxon = $self->{tree}->find_node(-rank => 'species');
unless ($species_taxon) {
- # just assume we are rank species
- $species_taxon = $self;
+ # whip up a new species object so that we don't
+ # end up with a cycle in the tree.
+ # initialize it with self's important bits.
+ # NOTE TO ALL: any missing important bits?
+ $species_taxon =
+ Bio::Species->new(-classification =>
+ [$self->classification]);
}
$species = $species_taxon->scientific_name;
@@ -278,7 +283,6 @@
$self->{tree} = Bio::Tree::Tree->new(-node => $species_taxon);
delete $self->{tree}->{_root_cleanup_methods};
$root = $self->{tree}->get_root_node;
- weaken($self->{tree}->{'_rootnode'}) unless isweak($self->{tree}->{'_rootnode'});
}
my @spflds = split(' ', $species);
@@ -395,15 +399,6 @@
if ($ss_taxon) {
if ($sub) {
$ss_taxon->scientific_name($sub);
-
- # *** weakening ref to our root node in species() to solve a
- # memory leak means that we have a subspecies taxon to set
- # during the first call to species(), but it has vanished by
- # the time a user subsequently calls sub_species() to get the
- # value. So we 'cheat' and just store the subspecies name in
- # our self hash, instead of the tree. Is this a problem for
- # a Species object? Can't decide --sendu
- $self->{'_sub_species'} = $sub;
}
return $ss_taxon->scientific_name;
}
More information about the Bioperl-l
mailing list