[Bioperl-l] Memory requirements for conversion from embl to genbank

Sendu Bala bix at sendu.me.uk
Thu Aug 31 14:14:59 UTC 2006


Martin MOKREJŠ wrote:
> Hi,
>   I use bp_sreformat.pl to convert a file from embl format
> to genbank. I use current cvs HEAD version and cannot parse
> two files. Each record is small and I don't understand why
> is the such a huge memory requirement. The machine has 1GB
> RAM and running recent recent linux kernel. Moreover, I could
> parse the same file with bioperl-1.5.1 when I have manually
> fixed some missing quotes in the file.
[...]
> $ bp_sreformat.pl -if embl -of genbank -i 5UTR.Vrl_nr.dat -o 5UTR.Vrl_nr.gb

The problem occurs simply doing
$si = new Bio::SeqIO(-format => "embl", -file => "file");
while ($seq = $si->next_seq) { }

[...]
> I am not a perl guru so nor am familiar with bioperl code. Does someone know
> whether the parsed records are held in the memory or not? It seems so.
> I guess deleting the objects from memory can be done by dereferencing
> them after they get written down in the new format immediately. Or, the
> garbage collector does not work well in perl 5.8.8.

No, the bp_sreformat.pl code and similar, and perl itself are fine from 
a memory point of view. The problem is new SeqIO parsing of taxonomic 
information. Not only is there a big memory leak, I've realised it is 
also fantastically slow. I'll come up with a fix shortly.

Sorry,
Sendu.



More information about the Bioperl-l mailing list