Bioperl: Any non-redundant database tools out there ??? (fwd)

Ewan Birney birney@sanger.ac.uk
Thu, 27 Aug 1998 17:25:41 +0100 (BST)



Gordon posted this is to 'guts' but it seems much more
appropiate to post the main mailing list, hence I am
forwarding it.



Ewan Birney
<birney@sanger.ac.uk>
http://www.sanger.ac.uk/Users/birney/

---------- Forwarded message ----------
Date: Thu, 27 Aug 1998 11:11:50 -0500
From: Gordon D. Pusch <pusch@mcs.anl.gov>
To: vsns-bcd-perl-guts@lists.uni-bielefeld.de
Subject: Bioperl-guts: Any non-redundant database tools out there ???

Hi --- I am trying to construct a ``non-redundant'' version of WIT's
sequence database. An obvious stupid-but-simple way to do this would
be to use the sequence itself as the key to a hash of ID lists.

However, since there are a LOT of sequences, the whole thing obviously
won't fit into memory and we will have to store the hash as a Berkeley-DB;
and off course, some of the sequences are quite long.  I worry about such
enormously long keys ``breaking'' something in either perl5 or Berkeley-DB's
hash routines ---I gather they are stored internally as B-trees, so I
could easily imagine very long keys producing stack-overflows during a
tree traversal if the trees got too deep... :-(

Has anyone on this list implemented a non-redundant database-builder 
in perl ???  

Does anyone know if there =IS= there a limit as to how long a hash-key
can be for either perl5 or Berkeley-DB ???  If so, what are the usual
failure-modes ???

Can anyone suggest a more elegant algorithm than the ``stupid-but-simple'' 
method outlined above ???


Thanks in advance,

--  Gordon D. Pusch   <pusch@mcs.anl.gov>

Disclaimer:  I'm a consultant collaborating with Argonne researchers;
I don't speak for ANL or the DOE --- and they *certainly* don't speak
for =ME= !!!

Claimer:  I report =ALL= SPAMvertisers to their ISP --- =NO= exceptions !!!

=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl-guts.html
====================================================================

=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================