[Bioperl-l] Help with threads and shared variable
Jonathan Crabtree
jonathancrabtree at gmail.com
Wed Dec 24 18:03:25 UTC 2008
Hi Marco,
I can't be exactly sure what's going wrong in your examples, since you
haven't posted the crucial "make_uge_array" function, or told us which
version of Perl you are using. However, I'd guess that perhaps you're
creating a multi-dimensional array without sharing anything except the
top-level array. Here is a short test program, which runs correctly
on perl 5.8.8 and may help to illustrate how the Perl threads::shared
module expects you to create and share nested data structures. You
have to manually share any nested references and I think that the
order in which the sharing calls are made may also be significant:
#!/usr/bin/perl
use strict;
use warnings;
use threads;
use threads::shared;
# threads::shared test/demo program
# creates a shared 2-dimensional array and checks that it can be seen
in a thread
# tested in perl v5.8.8 built for i486-linux-gnu-thread-multi
## ----------------------------------------
## globals
## ----------------------------------------
# set the width and height of the 2d array to this value:
my $ARRAY_SIZE = 10;
## ----------------------------------------
## main program
## ----------------------------------------
# calls to &share take place in here, so a shared value is returned
my $array = &make_shared_array();
# print array contents before running thread
print "shared array before running thread:\n";
&check_and_print_array($array);
# run thread
my $thr = threads->create(\&do_the_job, $array);
my $retval = $thr->join();
print "join() returned: $retval\n";
# print array contents after running thread
print "shared array after running thread:\n";
&check_and_print_array($array);
exit(0);
## ----------------------------------------
## subroutines
## ----------------------------------------
sub make_shared_array {
# outermost array object must be made shared first
my $a = &share([]);
for (my $i = 0;$i < $ARRAY_SIZE;++$i) {
# each of the rows must be explicitly shared
my $row = &share([]);
# and then added to the containing array
$a->[$i] = $row;
# assign each cell a unique integer for verification purposes
my $base = $i * $ARRAY_SIZE;
for (my $j = 0;$j < $ARRAY_SIZE;++$j) {
$row->[$j] = $base + $j;
}
}
return $a;
}
# print out the array, checking that its dimensions match what we expect
sub check_and_print_array {
my $arr = shift;
die "not an array" if ((ref $arr) ne 'ARRAY');
my $nr = scalar(@$arr);
die "wrong number of rows in array" if ($nr != $ARRAY_SIZE);
for (my $i = 0;$i < $nr;++$i) {
my $row = $arr->[$i];
die "row $i not an array" if ((ref $row) ne 'ARRAY');
my $nc = scalar(@$row);
die "wrong number of columns in row $i" if ($nc != $ARRAY_SIZE);
for (my $j = 0;$j < $nc;++$j) {
my $val = $row->[$j];
printf("%10s", $val);
}
print "\n";
}
}
# work to execute in the thread
sub do_the_job {
my $var = shift;
# print the array once more in the thread
print "shared array in thread:\n";
&check_and_print_array($var);
return "do_the_job returned ok";
}
When I run it (on Ubuntu) the output looks like this:
shared array before running thread:
0 1 2 3 4 5 6
7 8 9
10 11 12 13 14 15 16
17 18 19
20 21 22 23 24 25 26
27 28 29
30 31 32 33 34 35 36
37 38 39
40 41 42 43 44 45 46
47 48 49
50 51 52 53 54 55 56
57 58 59
60 61 62 63 64 65 66
67 68 69
70 71 72 73 74 75 76
77 78 79
80 81 82 83 84 85 86
87 88 89
90 91 92 93 94 95 96
97 98 99
shared array in thread:
0 1 2 3 4 5 6
7 8 9
10 11 12 13 14 15 16
17 18 19
20 21 22 23 24 25 26
27 28 29
30 31 32 33 34 35 36
37 38 39
40 41 42 43 44 45 46
47 48 49
50 51 52 53 54 55 56
57 58 59
60 61 62 63 64 65 66
67 68 69
70 71 72 73 74 75 76
77 78 79
80 81 82 83 84 85 86
87 88 89
90 91 92 93 94 95 96
97 98 99
join() returned: do_the_job returned ok
shared array after running thread:
0 1 2 3 4 5 6
7 8 9
10 11 12 13 14 15 16
17 18 19
20 21 22 23 24 25 26
27 28 29
30 31 32 33 34 35 36
37 38 39
40 41 42 43 44 45 46
47 48 49
50 51 52 53 54 55 56
57 58 59
60 61 62 63 64 65 66
67 68 69
70 71 72 73 74 75 76
77 78 79
80 81 82 83 84 85 86
87 88 89
90 91 92 93 94 95 96
97 98 99
I haven't verified that doing this actually yields the memory savings
you're looking for, but I don't see why it shouldn't. Hope this
helps,
Jonathan
On Sat, Dec 20, 2008 at 6:10 PM, Blanchette, Marco
<MAB at stowers-institute.org> wrote:
> Dear all,
>
> I am not sure this is the best place to post that questions but I don't really know where else to go... So, let's give it a shot.
>
> I am using the Perl threads utility to successfully multi threads several of my computing jobs on my workstation. My current problem is that I need to perform multiple processes using the same humongous array (more than 2x10e6 items). My problem is that the computing time for each iteration is not very long but I have a lot of iterations to do and every time a thread is created I am passing the huge array to the function and a fresh copy of the array is created. Thus, there is a huge amount of wasted resources (time and memory) use to create these data structures that are used by each threads but not modified.
>
> The logical alternative is to use shared memory where all thread would have access to the same copy of the huge array. In principal Perl provide such a mechanism through the module threads::shared but I am unable to understand how to use the shared variables.
>
> Anyone has experience to share on threads::shared? Here is a couple of unsuccessful attempts to use that module:
>
>
> ### first example
> my $var :shared; #create a shared scalar
> $var = make_uge_array; #return a pointer to a huge array and trying to assign it the the shared pointer
> my $thr = threads->create(\&doTheJob,$var); #spawn a thread
> $thr->join; #Wait for the thread to return
> ### Generate the following error
> ### Invalid value for shared scalar at ...
>
> ### second example
> my $var = make_uge_array; #return a pointer to a huge array
> print scalar(@{$var}), "\n"; #print 2,000,000
>
> share($var);
> print scalar(@{$var}), "\n"; #print 0
>
> my $thr = threads->create(\&doTheJob,$var); #spawn a thread
> $thr->join; #Wait for the thread to return
>
> ### third example
> my @array :shared; #create a share array
> make_uge_array(\@array) #pass a ref fo the array to a function populate it with 2,000,000 items
> print scalar(@array), "\n"; #print 2,000,000
>
> my $thr = threads->create(\&doTheJob,$var); #spawn a thread
> $thr->join; #Wait for the thread to return
>
> sub doTheJob{ scalar(@_), "\n"} ## print O
>
> Finally I tried to pass to the thread creation utility a ref of the huge shared array but the main process never stop at the join() utility, it bailed out with the thread still running.
>
> Any suggestion will be appreciated.
>
> Also, feel free to suggest me a better place to post this request.
>
> Many thanks,
>
> Marco
>
> --
> Marco Blanchette, Ph.D.
> Assistant Investigator
> Stowers Institute for Medical Research
> 1000 East 50th St.
>
> Kansas City, MO 64110
>
> Tel: 816-926-4071
> Cell: 816-726-8419
> Fax: 816-926-2018
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
More information about the Bioperl-l
mailing list