[Bioperl-l] Memory requirements for conversion from embl to genbank

Martin MOKREJŠ mmokrejs at ribosome.natur.cuni.cz
Thu Aug 31 12:44:58 UTC 2006


Hi,
  I use bp_sreformat.pl to convert a file from embl format
to genbank. I use current cvs HEAD version and cannot parse
two files. Each record is small and I don't understand why
is the such a huge memory requirement. The machine has 1GB
RAM and running recent recent linux kernel. Moreover, I could
parse the same file with bioperl-1.5.1 when I have manually
fixed some missing quotes in the file.

  With current changes to the embl & genbank parsing (bug #2077)
I no longer can parse the file.

  Here is the memory status at the moment when the machine ran
out of memory and linux kernel killed the application:

 1  0 803212  20936      8   2184    0    0     0     0 1062   38 99  1  0  0
 1  0 803208  19944      8   2184    0    0     0     0 1062   38 100  0  0  0
 1  0 803208  18828      8   2184    0    0     0     0 1061   37 100  0  0  0
 1  0 803204  17836      8   2184    0    0     0     0 1062   40 100  0  0  0
 1  0 803204  16844      8   2184    0    0     0     0 1062   48 100  0  0  0
 1  0 803200  15728      8   2184   32    0    32     0 1063   41 100  0  0  0
 1  0 803200  14736      8   2184    0    0     0     0 1062   41 99  1  0  0
 1  0 803196  13744      8   2184    0    0     0     0 1061   38 100  0  0  0
 1  0 803240  13640      8   2184    0   48     0    48 1063   68 99  1  0  0
 1  1 803240  12920      8   1984    0   40     0    40 1065  136 100  0  0  0
 1  1 803240  13192      8   1872    0 1056     0  1056 1114  326 96  4  0  0
 1  1 803240  14448      8   1336    0   20     0    20 1081  192 90 10  0  0
 1  1 803240  13656      8   1232    0   28     0    28 1070  104 87 13  0  0
 1  1 803240  12892      8   1260   32    4   176     4 1069  113 86 14  0  0
 0  4 803240  12144      8   1344  192   24   612    24 1088  185 44 16  0 40
 0  7 803240  11952      8   1180   32   32   508    32 1113  591 46 23  0 32
 0  3 803240  11948      8   1336 1120  500 10816   500 4390 1397  2 31  0 66
 2  6 803240  12056      8   1788  752  136  9412   136 6101 1795  0 27  0 73
 0  7 803240  12176      8   1748   12    0  2180     0 1132  326  0 20  0 80
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  5 803240  12492      8   1508  136   32  7508    32 2610  865  4 45  0 51
 0  6 803240  12056      8   2004   64    8  1456     8 1138  312  9 18  0 73
 1  6 803240  12668      8   1452   96   28 14856    28 2434  658  0 31  0 69
 0  7 803240  13240      8    564    0    0  3112     0 4602 1492  4 38  0 58
 0 10 803240  12768      8    688   36 15272  6000 15272 2026  431 26 39  0 35
 0  2  81780 966512      8   5692  108    0  2904     0 2204  372  0 11  0 89
 0  3  81780 966204      8   6056  128    0   488     3 1155   82  1  0  0 99
 0  1  81780 965460      8   6260  492    0   696     0 1150  161  0  1 13 86
 0  1  81732 963652      8   7860    8    0  1608     0 1147  199  1  2 42 55
 0  1  81732 962052      8   8560    4    0   704     0 1129  177  6  1 43 50
 0  1  81732 960120      8   9128    0    0   568     0 1124  161 12  2 57 29
 0  1  81732 957512      8   9840    4    0   716     0 1137  191 13  2 27 58
 1  0  81732 954992      8  10640   32    0   832     0 1135  191 14  1 47 38
 1  0  81732 952824      8  11016    0    0   340     0 1096  128 64  1 18 16
 1  0  81732 952152      8  11092    0    0     0     0 1062   80 99  1  0  0
 1  0  81732 951424      8  11196    0    0     0     0 1062  105 99  1  0  0
 1  0  81732 950808      8  11264    0    0     0     0 1062   74 99  1  0  0


$ bp_sreformat.pl -if embl -of genbank -i 5UTR.Vrl_nr.dat -o 5UTR.Vrl_nr.gb
Killed
$

The file can be obtained from ftp://bighost.ba.itb.cnr.it-fixed/pub/Embnet/Database/UTR/data/

I am not a perl guru so nor am familiar with bioperl code. Does someone know
whether the parsed records are held in the memory or not? It seems so.
I guess deleting the objects from memory can be done by dereferencing
them after they get written down in the new format immediately. Or, the
garbage collector does not work well in perl 5.8.8.

Thanks for any help.
Martin

-- 
Dr. Martin Mokrejs
Faculty of Science, Charles University
Vinicna 5, 128 43 Prague, Czech Republic
http://www.iresite.org
http://www.iresite.org/~mmokrejs



More information about the Bioperl-l mailing list