[Bioperl-l] Memory requirements for conversion from embl to genbank
Martin MOKREJŠ
mmokrejs at ribosome.natur.cuni.cz
Thu Aug 31 15:07:14 UTC 2006
I observe the same. Testcase here. Please push it into tescases.
It will be helpful in the future when the parser should cope with the
two /note feature lines.
M.
Sendu Bala wrote:
> Martin MOKREJŠ wrote:
>
>>Hi,
>> I use bp_sreformat.pl to convert a file from embl format
>>to genbank. I use current cvs HEAD version and cannot parse
>>two files. Each record is small and I don't understand why
>>is the such a huge memory requirement. The machine has 1GB
>>RAM and running recent recent linux kernel. Moreover, I could
>>parse the same file with bioperl-1.5.1 when I have manually
>>fixed some missing quotes in the file.
>
> [...]
>
>>$ bp_sreformat.pl -if embl -of genbank -i 5UTR.Vrl_nr.dat -o 5UTR.Vrl_nr.gb
>
>
> The problem occurs simply doing
> $si = new Bio::SeqIO(-format => "embl", -file => "file");
> while ($seq = $si->next_seq) { }
>
> [...]
>
>>I am not a perl guru so nor am familiar with bioperl code. Does someone know
>>whether the parsed records are held in the memory or not? It seems so.
>>I guess deleting the objects from memory can be done by dereferencing
>>them after they get written down in the new format immediately. Or, the
>>garbage collector does not work well in perl 5.8.8.
>
>
> No, the bp_sreformat.pl code and similar, and perl itself are fine from
> a memory point of view. The problem is new SeqIO parsing of taxonomic
> information. Not only is there a big memory leak, I've realised it is
> also fantastically slow. I'll come up with a fix shortly.
>
> Sorry,
> Sendu.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Dr. Martin Mokrejs
Faculty of Science, Charles University
Vinicna 5, 128 43 Prague, Czech Republic
http://www.iresite.org
http://www.iresite.org/~mmokrejs
-------------- next part --------------
A non-text attachment was scrubbed...
Name: two_note_features.embl
Type: chemical/x-embl-dl-nucleotide
Size: 3643 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060831/4686615d/attachment-0004.bin>
More information about the Bioperl-l
mailing list