[Bioperl-l] Performance of Bio::Species
Sendu Bala
bix at sendu.me.uk
Sat Nov 25 12:47:28 UTC 2006
Jason Stajich wrote:
> Can we just weaken the references with Scalar::Util? This should solve
> the problem for circular refs.
I don't know about Stefan's problem, but I tried weakening refs - it
fixed the memory leak I was seeing, but caused other problems.
> Is Scalar::Util part of the core distro in the min perl we are supporting?
Yes.
> I can add this in Bio::Tree::Node and look around to see where else it
> is a problem. We just need a simple script to verify it is having an
> effect (i.e. a bug report with this).
perl -w -MBio::SeqIO -e '$si = new Bio::SeqIO(-file =>
"5UTR.Pln_nr.dat", -format => "embl"); while ($seq = $si->next_seq) {
$seq->id; }'
Where 5UTR.Pln_nr.dat is a large embl file with ~50000 sequences. For me
this takes ~11mins to parse and ~2GB memory.
Once I weakened refs in all the places I could find in Bio::Tree::Node
and Bio::Tree::Tree it used a constant 0.3% of memory but still took
around 11mins. However lots of the tests in the test suite then fail,
because Nodes are often made purely to add into a Tree, with the
requirement that the Tree keeps hard refs to them all (else the Tree
would fall apart).
I think the Tree actually only keeps a ref to its root Node, which means
Nodes in general must keep hard refs to their Descendants. With that
constraint, I haven't been able to break the deadlock and get these
things to clean up.
Hopefully I'm missing something obvious; please look into it.
More information about the Bioperl-l
mailing list