[Bioperl-l] Memory requirements for conversion from embl to genbank
Chris Fields
cjfields at uiuc.edu
Thu Aug 31 13:32:59 UTC 2006
Martin,
Do you get the same issue using SeqIO?
#/usr/bin/perl -w
use strict;
use warnings;
use Bio::SeqIO;
$file_in = '5UTR.Vrl_nr.dat';
$file_out = '5UTR.Vrl_nr.gb';
my $seqin = Bio::SeqIO->new(-format => 'embl',
-file => "<$file_in");
my $seqout = Bio::SeqIO->new(-format => 'genbank',
-file => ">$file_out");
while (my $seq = $seqin->next_seq) {
print "Acc:",$seq->accession,"\n";
$seqout->write_seq($seq);
}
Chris
On Aug 31, 2006, at 7:44 AM, Martin MOKREJŠ wrote:
> Hi,
> I use bp_sreformat.pl to convert a file from embl format
> to genbank. I use current cvs HEAD version and cannot parse
> two files. Each record is small and I don't understand why
> is the such a huge memory requirement. The machine has 1GB
> RAM and running recent recent linux kernel. Moreover, I could
> parse the same file with bioperl-1.5.1 when I have manually
> fixed some missing quotes in the file.
>
> With current changes to the embl & genbank parsing (bug #2077)
> I no longer can parse the file.
>
> Here is the memory status at the moment when the machine ran
> out of memory and linux kernel killed the application:
>
> 1 0 803212 20936 8 2184 0 0 0 0 1062 38
> 99 1 0 0
> 1 0 803208 19944 8 2184 0 0 0 0 1062 38
> 100 0 0 0
> 1 0 803208 18828 8 2184 0 0 0 0 1061 37
> 100 0 0 0
> 1 0 803204 17836 8 2184 0 0 0 0 1062 40
> 100 0 0 0
> 1 0 803204 16844 8 2184 0 0 0 0 1062 48
> 100 0 0 0
> 1 0 803200 15728 8 2184 32 0 32 0 1063 41
> 100 0 0 0
> 1 0 803200 14736 8 2184 0 0 0 0 1062 41
> 99 1 0 0
> 1 0 803196 13744 8 2184 0 0 0 0 1061 38
> 100 0 0 0
> 1 0 803240 13640 8 2184 0 48 0 48 1063 68
> 99 1 0 0
> 1 1 803240 12920 8 1984 0 40 0 40 1065 136
> 100 0 0 0
> 1 1 803240 13192 8 1872 0 1056 0 1056 1114 326
> 96 4 0 0
> 1 1 803240 14448 8 1336 0 20 0 20 1081 192
> 90 10 0 0
> 1 1 803240 13656 8 1232 0 28 0 28 1070 104
> 87 13 0 0
> 1 1 803240 12892 8 1260 32 4 176 4 1069 113
> 86 14 0 0
> 0 4 803240 12144 8 1344 192 24 612 24 1088 185
> 44 16 0 40
> 0 7 803240 11952 8 1180 32 32 508 32 1113 591
> 46 23 0 32
> 0 3 803240 11948 8 1336 1120 500 10816 500 4390 1397
> 2 31 0 66
> 2 6 803240 12056 8 1788 752 136 9412 136 6101 1795
> 0 27 0 73
> 0 7 803240 12176 8 1748 12 0 2180 0 1132 326
> 0 20 0 80
> procs -----------memory---------- ---swap-- -----io---- -system--
> ----cpu----
> r b swpd free buff cache si so bi bo in cs
> us sy id wa
> 0 5 803240 12492 8 1508 136 32 7508 32 2610 865
> 4 45 0 51
> 0 6 803240 12056 8 2004 64 8 1456 8 1138 312
> 9 18 0 73
> 1 6 803240 12668 8 1452 96 28 14856 28 2434 658
> 0 31 0 69
> 0 7 803240 13240 8 564 0 0 3112 0 4602 1492
> 4 38 0 58
> 0 10 803240 12768 8 688 36 15272 6000 15272 2026 431
> 26 39 0 35
> 0 2 81780 966512 8 5692 108 0 2904 0 2204 372
> 0 11 0 89
> 0 3 81780 966204 8 6056 128 0 488 3 1155 82
> 1 0 0 99
> 0 1 81780 965460 8 6260 492 0 696 0 1150 161
> 0 1 13 86
> 0 1 81732 963652 8 7860 8 0 1608 0 1147 199
> 1 2 42 55
> 0 1 81732 962052 8 8560 4 0 704 0 1129 177
> 6 1 43 50
> 0 1 81732 960120 8 9128 0 0 568 0 1124 161
> 12 2 57 29
> 0 1 81732 957512 8 9840 4 0 716 0 1137 191
> 13 2 27 58
> 1 0 81732 954992 8 10640 32 0 832 0 1135 191
> 14 1 47 38
> 1 0 81732 952824 8 11016 0 0 340 0 1096 128
> 64 1 18 16
> 1 0 81732 952152 8 11092 0 0 0 0 1062 80
> 99 1 0 0
> 1 0 81732 951424 8 11196 0 0 0 0 1062 105
> 99 1 0 0
> 1 0 81732 950808 8 11264 0 0 0 0 1062 74
> 99 1 0 0
>
>
> $ bp_sreformat.pl -if embl -of genbank -i 5UTR.Vrl_nr.dat -o
> 5UTR.Vrl_nr.gb
> Killed
> $
>
> The file can be obtained from ftp://bighost.ba.itb.cnr.it-fixed/pub/
> Embnet/Database/UTR/data/
>
> I am not a perl guru so nor am familiar with bioperl code. Does
> someone know
> whether the parsed records are held in the memory or not? It seems so.
> I guess deleting the objects from memory can be done by dereferencing
> them after they get written down in the new format immediately. Or,
> the
> garbage collector does not work well in perl 5.8.8.
>
> Thanks for any help.
> Martin
>
> --
> Dr. Martin Mokrejs
> Faculty of Science, Charles University
> Vinicna 5, 128 43 Prague, Czech Republic
> http://www.iresite.org
> http://www.iresite.org/~mmokrejs
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign
More information about the Bioperl-l
mailing list