[Bioperl-l] cloning and Storable
Will Spooner
whs at sanger.ac.uk
Fri Sep 5 04:39:58 EDT 2003
This may be a good time to mention that we have developed a
Bio::Root::Storable module for use with the Ensembl web site.
This module is used specifically for serialising/retrieving Search objects
to disk after parsing with SearchIO. It is in production, and works very
well, even in a high-throughput environment.
The implementation is generic (can be inhereted by any bioperl object),
and even implements a clone() method.
I have attached the module to this mail, and pasted the description below.
I would be delighted to see this module incorperated into BioPerl if
appropriate. I don't know whether it will solve Hekki's problem, but may
provide an alternative to writing a new serialiser!
Regards,
Will
---
Will Spooner whs at sanger.ac.uk
Ensembl Web Developer http://www.ensembl.org
NAME
Bio::Root::Storable - object serialisation methods
SYNOPSIS
my $storable = Bio::Root::Storable->new();
# Store/retrieve using class retriever
my $token = $storable->store();
my $storable2 = Bio::Root::Storable->retrieve( $token );
# Store/retrieve using object retriever
my $storable2 = $storable->new_retrievable();
$storable2->retrieve();
DESCRIPTION
Generic module that allows objects to be safely stored/retrieved from
disk. Can be inhereted by any BioPerl object. As it will not usually be
the first class in the inheretence list, _initialise_storable() should
be called during object instantiation.
Currently stores objects in binary format (using the Perl Storable
module). This can cause problems when storing and retrieving with different
versions of Storable (e.g. on different machines). An ASCII storage
option (using Data::Dumper) may be implemented in the future.
Object storage is recursive; If the object being stored contains other
storable objects, these will be stored seperately, and replaced by a
skeleton object in the parent heirarchy. When the parent is later
retrieved, its children remain in the skeleton state until explicitly
retrieved by the parent. This lazy-retrieve approach has obvious memory
efficiency benefits for certain applications.
-------------- next part --------------
=head1 NAME
Bio::Root::Storable - object serialisation methods
=head1 SYNOPSIS
my $storable = Bio::Root::Storable->new();
# Store/retrieve using class retriever
my $token = $storable->store();
my $storable2 = Bio::Root::Storable->retrieve( $token );
# Store/retrieve using object retriever
my $storable2 = $storable->new_retrievable();
$storable2->retrieve();
=head1 DESCRIPTION
Generic module that allows objects to be safely stored/retrieved from
disk. Can be inhereted by any BioPerl object. As it will not usually
be the first class in the inheretence list, _initialise_storable()
should be called during object instantiation.
Currently stores objects in binary format (using the Perl Storable
module). This can cause problems when storing and retrieving with
different versions of Storable (e.g. on different machines). An
ASCII storage option (using Data::Dumper) may be implemented in the
future.
Object storage is recursive; If the object being stored contains other
storable objects, these will be stored seperately, and replaced by a
skeleton object in the parent heirarchy. When the parent is later
retrieved, its children remain in the skeleton state until explicitly
retrieved by the parent. This lazy-retrieve approach has obvious
memory efficiency benefits for certain applications.
=cut
# Let the code begin...
package Bio::Root::Storable;
use strict;
#use Data::Dumper qw( Dumper );
use Storable qw( freeze thaw dclone );
use vars qw(@ISA);
use Bio::Root::Root;
use Bio::Root::IO;
@ISA = qw( Bio::Root::Root );
#----------------------------------------------------------------------
=head2 new
Arg [1] : -workdir => filesystem path,
-template => tmpfile template,
-suffix => tmpfile suffix,
Function : Builds a new Bio::Root::Storable inhereting object
Returntype: Bio::Root::Storable inhereting object
Exceptions:
Caller :
Example : $storable = Bio::Root::Storable->new()
=cut
sub new {
my ($caller, @args) = @_;
my $self = $caller->SUPER::new(@args);
$self->_initialise_storable;
return $self;
}
#----------------------------------------------------------------------
=head2 _initialise_storable
Arg [1] : See 'new' method
Function : Initialises storable-specific attributes
Returntype: boolean
Exceptions:
Caller :
Example :
=cut
sub _initialise_storable {
my $self = shift;
my( $workdir, $template, $suffix ) =
$self->_rearrange([qw(WORKDIR TEMPLATE SUFFIX)], @_ );
$workdir && $self->workdir ( $workdir );
$template && $self->template( $template );
$suffix && $self->suffix ( $suffix );
return 1;
}
#----------------------------------------------------------------------
=head2 statefile
Arg [1] : string (optional)
Function : Accessor for the file to write state into.
Should not normaly use as a setter - let Root::IO
do this for you.
Returntype: string
Exceptions:
Caller : Bio::Root::Storable->store
Example : my $statefile = $obj->statefile();
=cut
sub statefile{
my $key = '_statefile';
my $self = shift;
if( @_ ){ $self->{$key} = shift }
if( ! $self->{$key} ){ # Create a new statefile
my $workdir = $self->workdir;
my $template = $self->template;
my $suffix = $self->suffix;
# TODO: add cleanup and unlink methods. For now, we'll keep the
# statefile hanging around.
my @args = ( CLEANUP=>0, UNLINK=>0 );
if( $template ){ push( @args, 'TEMPLATE' => $template )};
if( $workdir ){ push( @args, 'DIR' => $workdir )};
if( $suffix ){ push( @args, 'SUFFIX' => $suffix )};
my( $fh, $file ) = Bio::Root::IO->new->tempfile( @args );
$self->{$key} = $file;
}
return $self->{$key};
}
#----------------------------------------------------------------------
=head2 workdir
Arg [1] : string (optional) (TODO - convert to array for x-platform)
Function : Accessor for the statefile directory. Defaults to
$Bio::Root::IO::TEMPDIR
Returntype: string
Exceptions:
Caller :
Example : $obj->workdir('/tmp/foo');
=cut
sub workdir {
my $key = '_workdir';
my $self = shift;
if( @_ ){
my $caller = join( ', ', (caller(0))[1..2] );
$self->{$key} && $self->debug("Overwriting workdir: probably bad!");
$self->{$key} = shift
}
$self->{$key} ||= $Bio::Root::IO::TEMPDIR;
return $self->{$key};
}
#----------------------------------------------------------------------
=head2 template
Arg [1] : string (optional)
Function : Accessor for the statefile template. Defaults to XXXXXXXX
Returntype: string
Exceptions:
Caller :
Example : $obj->workdir('RES_XXXXXXXX');
=cut
sub template {
my $key = '_template';
my $self = shift;
if( @_ ){ $self->{$key} = shift }
$self->{$key} ||= 'XXXXXXXX';
return $self->{$key};
}
#----------------------------------------------------------------------
=head2 suffix
Arg [1] : string (optional)
Function : Accessor for the statefile template.
Returntype: string
Exceptions:
Caller :
Example : $obj->suffix('.state');
=cut
sub suffix {
my $key = '_suffix';
my $self = shift;
if( @_ ){ $self->{$key} = shift }
return $self->{$key};
}
#----------------------------------------------------------------------
=head2 new_retrievable
Arg [1] : Same as for 'new'
Function : Similar to store, except returns a 'skeleton' of the calling
object, rather than the statefile.
The skeleton can be repopulated by calling 'retrieve'. This
will be a clone of the original object.
Returntype: Bio::Root::Storable inhereting object
Exceptions:
Caller :
Example : my $clone = $obj->new_retrievable(); # skeleton
$skel->retrieve(); # clone
=cut
sub new_retrievable{
my $self = shift;
my @args = @_;
$self->_initialise_storable( @args );
if( $self->retrievable ){ return $self->clone } # Clone retrievable
return bless( { _statefile => $self->store(@args),
_workdir => $self->workdir,
_suffix => $self->suffix,
_template => $self->template,
_retrievable => 1 }, ref( $self ) );
}
#----------------------------------------------------------------------
=head2 retrievable
Arg [1] : none
Function : Reports whether the object is in 'skeleton' state, and the
'retrieve' method can be called.
Returntype: boolean
Exceptions:
Caller :
Example : if( $obj->retrievable ){ $obj->retrieve }
=cut
sub retrievable {
my $self = shift;
if( @_ ){ $self->{_retrievable} = shift }
return $self->{_retrievable};
}
#----------------------------------------------------------------------
=head2 token
Arg [1] : None
Function : Accessor for token attribute
Returntype: string. Whatever retrieve needs to retrieve.
This base implementation returns the statefile
Exceptions:
Caller :
Example : my $token = $obj->token();
=cut
sub token{
my $self = shift;
return $self->statefile;
}
#----------------------------------------------------------------------
=head2 store
Arg [1] : none
Function : Saves a serialised representation of the object structure
to disk. Returns the name of the file that the object was
saved to.
Returntype: string
Exceptions:
Caller :
Example : my $token = $obj->store();
=cut
sub store{
my $self = shift;
my $statefile = $self->statefile;
my $store_obj = $self->serialise;
my $io = Bio::Root::IO->new( ">$statefile" );
$io->_print( $store_obj->freeze() );
$self->debug( "STORING $store_obj to $statefile\n" );
return $statefile;
}
#----------------------------------------------------------------------
=head2 serialise
Arg [1] : none
Function : Prepares the the serialised representation of the object.
Object attribute names starting with '__' are skipped.
This is useful for those that do not serialise too well
(e.g. filehandles).
Attributes are examined for other storable objects. If these
are found they are serialised seperately using 'new_retrievable'
Returntype: string
Exceptions:
Caller :
Example : my $serialised = $obj->_prepare_storable();
=cut
sub serialise{
my $self = shift;
# Create a new object of same class that is going to be serialised
my $store_obj = bless( {}, ref( $self ) );
my %retargs = ( -workdir =>$self->workdir,
-suffix =>$self->suffix,
-template=>$self->template );
# Assume that other storable bio objects held by this object are
# only 1-deep.
foreach my $key( keys( %$self ) ){
if( $key =~ /^__/ ){ next } # Ignore keys starting with '__'
my $value = $self->{$key};
# Scalar value
if( ! ref( $value ) ){
$store_obj->{$key} = $value;
}
# Bio::Root::Storable obj: save placeholder
elsif( ref($value) =~ /^Bio::/ and $value->isa('Bio::Root::Storable') ){
# Bio::Root::Storable
$store_obj->{$key} = $value->new_retrievable( %retargs );
next;
}
# Arrayref value. Look for Bio::Root::Storable objs
elsif( ref( $value ) eq 'ARRAY' ){
my @ary;
foreach my $val( @$value ){
if( ref($val) =~ /^Bio::/ and $val->isa('Bio::Root::Storable') ){
push( @ary, $val->new_retrievable( %retargs ) );
}
else{ push( @ary, $val ) }
}
$store_obj->{$key} = \@ary;
}
# Hashref value. Look for Bio::Root::Storable objs
elsif( ref( $value ) eq 'HASH' ){
my %hash;
foreach my $k2( keys %$value ){
my $val = $value->{$k2};
if( ref($val) =~ /^Bio::/ and $val->isa('Bio::Root::Storable') ){
$hash{$k2} = $val->new_retrievable( %retargs );
}
else{ $hash{$k2} = $val }
}
$store_obj->{$key} = \%hash;
}
# Unknown, just add to the store object regardless
else{ $store_obj->{$key} = $value }
}
$store_obj->retrievable(0); # Once deserialised, obj not retrievable
return $store_obj;
}
#----------------------------------------------------------------------
=head2 retrieve
Arg [1] : string; filesystem location of the state file to be retrieved
Function : Retrieves a stored object from disk.
Note that the retrieved object will be blessed into its original
class, and not the
Returntype: Bio::Root::Storable inhereting object
Exceptions:
Caller :
Example : my $obj = Bio::Root::Storable->retrieve( $token );
=cut
sub retrieve{
my( $caller, $statefile ) = @_;
my $self = {};
my $class = ref( $caller ) || $caller;
# Is this a call on a retrievable object?
if( ref( $caller ) and
$caller->retrievable ){
$self = $caller;
$statefile = $self->statefile;
}
bless( $self, $class );
# Recover serialised object
if( ! -f $statefile ){
$self->throw( "Token $statefile is not found" );
}
my $io = Bio::Root::IO->new( $statefile );
local $/ = undef();
my $state_str = $io->_readline;
# Dynamic-load modules required by stored object
my $stored_obj;
my $success;
for( my $i=0; $i<10; $i++ ){
eval{ $stored_obj = thaw( $state_str ) };
if( ! $@ ){ $success=1; last }
my $package;
if( $@ =~ /Cannot restore overloading/i ){
my $postmatch = $';
if( $postmatch =~ /\(package +([\w\:]+)\)/ ) {
$package = $1;
}
}
if( $package ){
eval "require $package"; $self->throw($@) if $@;
}
else{ $self->throw($@) }
}
if( ! $success ){ $self->throw("maximum number of requires exceeded" ) }
if( ! ref( $stored_obj ) ){
$self->throw( "Token $statefile returned no data" );
}
map{ $self->{$_} = $stored_obj->{$_} } keys %$stored_obj; # Copy hasheys
$self->retrievable(0);
# Maintain class of stored obj
return $self;
}
#----------------------------------------------------------------------
=head2 clone
Arg [1] : none
Function : Returns a clone of the calling object
Returntype: Bio::Root::Storable inhereting object
Exceptions:
Caller :
Example : my $clone = $obj->clone();
=cut
sub clone {
my $self = shift;
return dclone( $self );
}
#----------------------------------------------------------------------
=head2 remove
Arg [1] : none
Function : Clears the stored object from disk
Returntype: boolean
Exceptions:
Caller :
Example : $obj->remove();
=cut
sub remove {
my $self = shift;
if( -e $self->statefile ){
unlink( $self->statefile );
}
return 1;
}
#----------------------------------------------------------------------
1;
More information about the Bioperl-l
mailing list