[Bioperl-l] Re: [Bioclusters] BioPerl and memory handling
Ian Korf
iankorf at mac.com
Tue Nov 30 10:57:45 EST 2004
Perl does give memory back to the OS. If I do
my $dna = 'N' x 100000000;
the memory footprint is 192 MB.
undef $dna;
restores half the memory. This is not within a subroutine, but within
the main program.
On Nov 30, 2004, at 1:57 AM, Tim Cutts wrote:
>
> On 29 Nov 2004, at 11:32 pm, Ian Korf wrote:
>> Here's something odd. The following labeled block looks like it
>> should use no memory.
>>
>> BLOCK: {
>> my $FOO = 'N' x 100000000;
>> }
>>
>> The weird thing is that after executing the block, the memory
>> footprint is still 192 Mb as if it hadn't been garbage collected.
>
> Perl's garbage collection does not give the memory back to the OS; it
> just marks the allocated memory for internal reuse by subsequent
> allocations within perl.
>
> This is actually true of most UNIX programs; this is not unique to
> perl. free() does not necessarily give the memory back to the
> operating system, it just marks it for re-use by the current process
> the next time it calls malloc(). The memory doesn't become available
> to the OS until the program exits.
>
> This is one reason why garbage collecting languages like perl and java
> should not be relied on to keep memory under control; GC does *not*
> absolve the programmer from the need to keep their memory usage tight.
>
> Consider the following C program (which you need to run on an OS which
> actually populates all the contents of the rusage struct - Linux does
> not, and neither does MacOS X, but Tru64 does):
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <sys/time.h>
> #include <sys/resource.h>
>
> #define PRINT_RESOURCES(x) getrusage(RUSAGE_SELF, &r);\
> printf(#x "\n\nShared: %lu\nUnshared: %lu\nStack: %lu\n\n",\
> r.ru_ixrss, r.ru_idrss, r.ru_isrss)
>
> int main(void) {
>
> char *p;
> struct rusage r;
> int i;
>
> PRINT_RESOURCES("Program start");
>
> p = malloc(100000000);
>
> /* Use the memory */
> for (i = 0; i<100000000; i++)
> p[i] = 'N';
>
> PRINT_RESOURCES("After malloc");
>
> free(p);
>
> PRINT_RESOURCES("After free");
>
> return 0;
>
> }
>
> The output on this Tru64 machine is:
>
> 09:46:26 tjrc at ecs2d:~$ ./memtest
> "Program start"
>
> Shared: 0
> Unshared: 0
> Stack: 0
>
> "After malloc"
>
> Shared: 19
> Unshared: 116577
> Stack: 19
>
> "After free"
>
> Shared: 19
> Unshared: 116577
> Stack: 19
>
> As you can see, free() does not actually release the memory from the
> process back to the operating system.
>>
>
>> sub foo {my $FOO = 'N' x 100000000}
>> for (my $i = 0; $i < 50; $i++) {foo()} # 29.420u 1.040s
>>
>> sub bar {my $BAR = 'N' x 100000000; undef $BAR}
>> for (my $i = 0; $i < 50; $i++) {bar()} # 26.880u 21.220s
>>
>> The increase from 1 sec to 21 sec system CPU time is all the extra
>> memory allocation and freeing associated with the undef statement.
>> Why the user time is less in the undef example is a mystery to me.
>
> I can explain this. It's because you're forgetting that the final
> statement in a perl subroutine is always its return value, even if you
> don't specify 'return', so if you allocate 100MB of Ns, as in the
> first case, and then return it (which you do because the allocation is
> the last statement in the subroutine) you actually force perl to
> *copy* that lexically scoped variable each time the routine is called.
> That's why the program uses 200MB of memory, not 100MB.
>
> In the second version, by explicitly freeing the memory, perl never
> has to copy the return value, so its memory footprint is half.
>
> Using undef has not actually freed any memory at all, it's just
> changed the return value from the function and stopped perl doubling
> its memory use.
>
> The lesson here is therefore to be very careful in perl subroutines
> where you don't care about the return value to make sure the return
> value is something tiny. Perl has no equivalent to a C void
> function.
>
> Tim
>
> --
> Dr Tim Cutts
> Informatics Systems Group, Wellcome Trust Sanger Institute
> GPG: 1024D/E3134233 FE3D 6C73 BBD6 726A A3F5 860B 3CDD 3F56 E313 4233
>
> _______________________________________________
> Bioclusters maillist - Bioclusters at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters
>
More information about the Bioperl-l
mailing list