[Bioperl-l] Memory requirements for conversion from embl to genbank
Martin MOKREJŠ
mmokrejs at ribosome.natur.cuni.cz
Thu Aug 31 12:44:58 UTC 2006
Hi,
I use bp_sreformat.pl to convert a file from embl format
to genbank. I use current cvs HEAD version and cannot parse
two files. Each record is small and I don't understand why
is the such a huge memory requirement. The machine has 1GB
RAM and running recent recent linux kernel. Moreover, I could
parse the same file with bioperl-1.5.1 when I have manually
fixed some missing quotes in the file.
With current changes to the embl & genbank parsing (bug #2077)
I no longer can parse the file.
Here is the memory status at the moment when the machine ran
out of memory and linux kernel killed the application:
1 0 803212 20936 8 2184 0 0 0 0 1062 38 99 1 0 0
1 0 803208 19944 8 2184 0 0 0 0 1062 38 100 0 0 0
1 0 803208 18828 8 2184 0 0 0 0 1061 37 100 0 0 0
1 0 803204 17836 8 2184 0 0 0 0 1062 40 100 0 0 0
1 0 803204 16844 8 2184 0 0 0 0 1062 48 100 0 0 0
1 0 803200 15728 8 2184 32 0 32 0 1063 41 100 0 0 0
1 0 803200 14736 8 2184 0 0 0 0 1062 41 99 1 0 0
1 0 803196 13744 8 2184 0 0 0 0 1061 38 100 0 0 0
1 0 803240 13640 8 2184 0 48 0 48 1063 68 99 1 0 0
1 1 803240 12920 8 1984 0 40 0 40 1065 136 100 0 0 0
1 1 803240 13192 8 1872 0 1056 0 1056 1114 326 96 4 0 0
1 1 803240 14448 8 1336 0 20 0 20 1081 192 90 10 0 0
1 1 803240 13656 8 1232 0 28 0 28 1070 104 87 13 0 0
1 1 803240 12892 8 1260 32 4 176 4 1069 113 86 14 0 0
0 4 803240 12144 8 1344 192 24 612 24 1088 185 44 16 0 40
0 7 803240 11952 8 1180 32 32 508 32 1113 591 46 23 0 32
0 3 803240 11948 8 1336 1120 500 10816 500 4390 1397 2 31 0 66
2 6 803240 12056 8 1788 752 136 9412 136 6101 1795 0 27 0 73
0 7 803240 12176 8 1748 12 0 2180 0 1132 326 0 20 0 80
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 5 803240 12492 8 1508 136 32 7508 32 2610 865 4 45 0 51
0 6 803240 12056 8 2004 64 8 1456 8 1138 312 9 18 0 73
1 6 803240 12668 8 1452 96 28 14856 28 2434 658 0 31 0 69
0 7 803240 13240 8 564 0 0 3112 0 4602 1492 4 38 0 58
0 10 803240 12768 8 688 36 15272 6000 15272 2026 431 26 39 0 35
0 2 81780 966512 8 5692 108 0 2904 0 2204 372 0 11 0 89
0 3 81780 966204 8 6056 128 0 488 3 1155 82 1 0 0 99
0 1 81780 965460 8 6260 492 0 696 0 1150 161 0 1 13 86
0 1 81732 963652 8 7860 8 0 1608 0 1147 199 1 2 42 55
0 1 81732 962052 8 8560 4 0 704 0 1129 177 6 1 43 50
0 1 81732 960120 8 9128 0 0 568 0 1124 161 12 2 57 29
0 1 81732 957512 8 9840 4 0 716 0 1137 191 13 2 27 58
1 0 81732 954992 8 10640 32 0 832 0 1135 191 14 1 47 38
1 0 81732 952824 8 11016 0 0 340 0 1096 128 64 1 18 16
1 0 81732 952152 8 11092 0 0 0 0 1062 80 99 1 0 0
1 0 81732 951424 8 11196 0 0 0 0 1062 105 99 1 0 0
1 0 81732 950808 8 11264 0 0 0 0 1062 74 99 1 0 0
$ bp_sreformat.pl -if embl -of genbank -i 5UTR.Vrl_nr.dat -o 5UTR.Vrl_nr.gb
Killed
$
The file can be obtained from ftp://bighost.ba.itb.cnr.it-fixed/pub/Embnet/Database/UTR/data/
I am not a perl guru so nor am familiar with bioperl code. Does someone know
whether the parsed records are held in the memory or not? It seems so.
I guess deleting the objects from memory can be done by dereferencing
them after they get written down in the new format immediately. Or, the
garbage collector does not work well in perl 5.8.8.
Thanks for any help.
Martin
--
Dr. Martin Mokrejs
Faculty of Science, Charles University
Vinicna 5, 128 43 Prague, Czech Republic
http://www.iresite.org
http://www.iresite.org/~mmokrejs
More information about the Bioperl-l
mailing list