[Bioperl-l] storing/retrieving a large hash on file system?

Tue May 18 16:47:33 UTC 2010

Thanks for all the suggestions.  Storable seems like the simplest
route.  This will save me hours of staring at my computer.

-Ben

On Tue, May 18, 2010 at 11:39 AM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>
>
> On Tue, May 18, 2010 at 12:09 PM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>>
>>
>> On Tue, May 18, 2010 at 11:28 AM, Ben Bimber <bimber at wisc.edu> wrote:
>>>
>>> this question is more of a general perl one than bioperl specific, so
>>> I hope it is appropriate for this list:
>>>
>>> I am writing code that has two steps.  the first generates a large,
>>> complex hash describing mutations.  it takes a fair amount of time to
>>> run this step.  the second step uses this data to perform downstream
>>> calculations.  for the purposes of writing/debugging this downstream
>>> code, it would save me a lot of time if i could run the first step
>>> once, then store this hash in something like the file system.  this
>>> way I could quickly load it, when debugging the downstream code
>>> without waiting for the hash to be recreated.
>>>
>>> is there a 'best practice' way to do something like this?  I could
>>> save a tab-delimited file, which is human readable, but does not
>>> represent the structure of the hash, so I would need code to re-parse
>>> it.  I assume I could probably do something along the lines of dumping
>>> a JSON string, then read/decode it.  this is easy, but not so
>>> human-readable.  is there another option i'm not thinking of?  what do
>>> others do in this sort of situation?
>>>
>>> thanks in advance.
>>>
>>
>> There are a number of solutions on CPAN, probably.  This is one maybe off
>> the beaten path, but it is getting a lot of press in the NoSQL database
>> realm:
>>
>> http://1978th.net/tokyocabinet/
>>
>
> Just to be clear, I am assuming that the problem at hand is storing a
> key/value pair and then retrieving it later.  If what you are talking about
> is a multi-level hash data structure, then Data::Dumper might be the easiest
> way to go.
>
> Sorry for the confusion....
>
> Sean
>
>
>