Bioperl: Any non-redundant database tools out there ??? (fwd)
Ewan Birney
birney@sanger.ac.uk
Thu, 27 Aug 1998 17:25:41 +0100 (BST)
Gordon posted this is to 'guts' but it seems much more
appropiate to post the main mailing list, hence I am
forwarding it.
Ewan Birney
<birney@sanger.ac.uk>
http://www.sanger.ac.uk/Users/birney/
---------- Forwarded message ----------
Date: Thu, 27 Aug 1998 11:11:50 -0500
From: Gordon D. Pusch <pusch@mcs.anl.gov>
To: vsns-bcd-perl-guts@lists.uni-bielefeld.de
Subject: Bioperl-guts: Any non-redundant database tools out there ???
Hi --- I am trying to construct a ``non-redundant'' version of WIT's
sequence database. An obvious stupid-but-simple way to do this would
be to use the sequence itself as the key to a hash of ID lists.
However, since there are a LOT of sequences, the whole thing obviously
won't fit into memory and we will have to store the hash as a Berkeley-DB;
and off course, some of the sequences are quite long. I worry about such
enormously long keys ``breaking'' something in either perl5 or Berkeley-DB's
hash routines ---I gather they are stored internally as B-trees, so I
could easily imagine very long keys producing stack-overflows during a
tree traversal if the trees got too deep... :-(
Has anyone on this list implemented a non-redundant database-builder
in perl ???
Does anyone know if there =IS= there a limit as to how long a hash-key
can be for either perl5 or Berkeley-DB ??? If so, what are the usual
failure-modes ???
Can anyone suggest a more elegant algorithm than the ``stupid-but-simple''
method outlined above ???
Thanks in advance,
-- Gordon D. Pusch <pusch@mcs.anl.gov>
Disclaimer: I'm a consultant collaborating with Argonne researchers;
I don't speak for ANL or the DOE --- and they *certainly* don't speak
for =ME= !!!
Claimer: I report =ALL= SPAMvertisers to their ISP --- =NO= exceptions !!!
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl-guts.html
====================================================================
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================