[Bioperl-l] Re: [Bioclusters] BioPerl and memory handling
Tim Cutts
tjrc at sanger.ac.uk
Tue Nov 30 04:57:17 EST 2004
On 29 Nov 2004, at 11:32 pm, Ian Korf wrote:
> Here's something odd. The following labeled block looks like it should
> use no memory.
>
> BLOCK: {
> my $FOO = 'N' x 100000000;
> }
>
> The weird thing is that after executing the block, the memory
> footprint is still 192 Mb as if it hadn't been garbage collected.
Perl's garbage collection does not give the memory back to the OS; it
just marks the allocated memory for internal reuse by subsequent
allocations within perl.
This is actually true of most UNIX programs; this is not unique to
perl. free() does not necessarily give the memory back to the
operating system, it just marks it for re-use by the current process
the next time it calls malloc(). The memory doesn't become available
to the OS until the program exits.
This is one reason why garbage collecting languages like perl and java
should not be relied on to keep memory under control; GC does *not*
absolve the programmer from the need to keep their memory usage tight.
Consider the following C program (which you need to run on an OS which
actually populates all the contents of the rusage struct - Linux does
not, and neither does MacOS X, but Tru64 does):
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <sys/resource.h>
#define PRINT_RESOURCES(x) getrusage(RUSAGE_SELF, &r);\
printf(#x "\n\nShared: %lu\nUnshared: %lu\nStack: %lu\n\n",\
r.ru_ixrss, r.ru_idrss, r.ru_isrss)
int main(void) {
char *p;
struct rusage r;
int i;
PRINT_RESOURCES("Program start");
p = malloc(100000000);
/* Use the memory */
for (i = 0; i<100000000; i++)
p[i] = 'N';
PRINT_RESOURCES("After malloc");
free(p);
PRINT_RESOURCES("After free");
return 0;
}
The output on this Tru64 machine is:
09:46:26 tjrc at ecs2d:~$ ./memtest
"Program start"
Shared: 0
Unshared: 0
Stack: 0
"After malloc"
Shared: 19
Unshared: 116577
Stack: 19
"After free"
Shared: 19
Unshared: 116577
Stack: 19
As you can see, free() does not actually release the memory from the
process back to the operating system.
>
> sub foo {my $FOO = 'N' x 100000000}
> for (my $i = 0; $i < 50; $i++) {foo()} # 29.420u 1.040s
>
> sub bar {my $BAR = 'N' x 100000000; undef $BAR}
> for (my $i = 0; $i < 50; $i++) {bar()} # 26.880u 21.220s
>
> The increase from 1 sec to 21 sec system CPU time is all the extra
> memory allocation and freeing associated with the undef statement. Why
> the user time is less in the undef example is a mystery to me.
I can explain this. It's because you're forgetting that the final
statement in a perl subroutine is always its return value, even if you
don't specify 'return', so if you allocate 100MB of Ns, as in the first
case, and then return it (which you do because the allocation is the
last statement in the subroutine) you actually force perl to *copy*
that lexically scoped variable each time the routine is called. That's
why the program uses 200MB of memory, not 100MB.
In the second version, by explicitly freeing the memory, perl never has
to copy the return value, so its memory footprint is half.
Using undef has not actually freed any memory at all, it's just changed
the return value from the function and stopped perl doubling its memory
use.
The lesson here is therefore to be very careful in perl subroutines
where you don't care about the return value to make sure the return
value is something tiny. Perl has no equivalent to a C void function.
Tim
--
Dr Tim Cutts
Informatics Systems Group, Wellcome Trust Sanger Institute
GPG: 1024D/E3134233 FE3D 6C73 BBD6 726A A3F5 860B 3CDD 3F56 E313 4233
More information about the Bioperl-l
mailing list