[Bioperl-l] storing/retrieving a large hash on file system?
Brian Osborne
bosborne11 at verizon.net
Tue May 18 16:00:06 UTC 2010
Ben,
I've use Storable to do things like this, for example:
use Storable;
my %species = ( "Sc" => 4932, # Saccharomyces cerevisiae
"Ec" => 83333, # Escherichia coli K12
"Hs" => 9606 # H. sapiens
);
my ($help,$id,$name);
GetOptions( "s=s" => \$name,
"i=i" => \$id,
"h" => \$help );
usage() if ($help || !$id || !$name);
my $storedHash = $name . ".dump";
# create index for a directory of fasta files
my $db = Bio::DB::Fasta->new($name, -makeid => \&make_my_id);
# extract species-specific data from gene2accession
unless (-e $storedHash) {
my $ref;
# extract species-specific information from gene2accession
open MYIN,"gene2accession" or die "No gene2accession file\n";
while (<MYIN>) {
my @arr = split "\t",$_;
if ($arr[0] == $species{$name} && $arr[9] =~ /\d+/ && $arr[10] =~ /\d+/) {
($ref->{$arr[1]}->{"start"}, $ref->{$arr[1]}->{"end"},
$ref->{$arr[1]}->{"strand"}, $ref->{$arr[1]}->{"id"}) =
($arr[9], $arr[10], $arr[11], $arr[7]);
}
}
# save species-specific information using Storable
store $ref, $storedHash;
}
# retrieve the species-specific data from a stored hash
my $ref = retrieve($storedHash);
Take away all the parsing details and you can see that it's simple, and that Storable exports store() and retrieve(). Make up a file name, "store" the hash reference.
Brian O.
On May 18, 2010, at 11:28 AM, Ben Bimber wrote:
> this question is more of a general perl one than bioperl specific, so
> I hope it is appropriate for this list:
>
> I am writing code that has two steps. the first generates a large,
> complex hash describing mutations. it takes a fair amount of time to
> run this step. the second step uses this data to perform downstream
> calculations. for the purposes of writing/debugging this downstream
> code, it would save me a lot of time if i could run the first step
> once, then store this hash in something like the file system. this
> way I could quickly load it, when debugging the downstream code
> without waiting for the hash to be recreated.
>
> is there a 'best practice' way to do something like this? I could
> save a tab-delimited file, which is human readable, but does not
> represent the structure of the hash, so I would need code to re-parse
> it. I assume I could probably do something along the lines of dumping
> a JSON string, then read/decode it. this is easy, but not so
> human-readable. is there another option i'm not thinking of? what do
> others do in this sort of situation?
>
> thanks in advance.
>
> -Ben
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list