[Biopython-dev] Implementation advice

Jeffrey Chang jchang at smi.stanford.edu
Mon Jun 17 14:55:44 EDT 2002


At the Biohackathon in April(?), we talked about the need to provide
this kind of database capability, and the 4 projects (biopython,
bioperl, biojava, and bioruby) decided to standardize on 2
cross-platform approaches.  For smaller databases, we invented our own
flat file format.  For larger ones, we used Berkeley DB.  Andrew wrote
some excellent documentation for these, but I can't find it right now.

Andrew has implemented both these already in Bio.Mindy.  Please take a
look there.  The advantage of using one of these is that 1) the db
stuff is already written, and 2) the resulting file will be usable for
the other bio projects as well.

Jeff


On Mon, Jun 17, 2002 at 05:24:26PM +0300, Iddo Friedberg wrote:
> Hi all,
> 
> I am trying to expand the functionality of FSSP a bit. As part of that, I
> would like to provide the user with the ability to give a PDB id, and
> retrieve the name of the FSSP file(s) containing that PDB id.
> 
> Without getting into too much details, each FSSP file (out of some 2800)
> has anywhere between 3 and 300 PDB ids, some of them in more than one
> file.
> 
> I was thinking of creating a dictionary which will look something like:
> { '1chyA': ['1xyzB','3fgy0'],
>   '3dcp0': ['3syx'],
>   '2abcC': ['3syx', '4rde'],
> .
> .
> .
> }
> # Meaning, that 1chyA is in the FSSP file represented by 1xyzB and in the
> # one represented by 3fgy0
> 
> Dictionary creation will be a one-time thing, its updates as frequently as
> the user likes (not very frequent), and queries will be many (very
> frequent). It seems a bit large to read (some 2800 keys, and rising) in
> anytime you actually need to find out where 2abcC is located, so I thought
> of using the Python dbm interface.
> 
> 'anydbm', so as to maximize platform independence.
> 
> ***** Is this good so far? Or is there a better tool I can use? I don't
> want to use SQL here... seems a bit of an overkill.
> 
> Because anydbm (as do gdbm, dumbdbm...) accepts only strings for keys and
> values, and I'd like to use lists in the values (maybe also in the keys),
> I thought that creating a UserDict instance which overloads __getitem__,
> __setitem__, etc. , using cPickle.loads and cPickle.dumps for key and
> values, this transparently enabling the use of non-strings in a Python dbm
> interface. (Bit of code attached).
> 
> **** This seems a very generic application. I'd be extremely surprised if
> nobody did something like this before. But I couldn't really find
> anything. Comments?
> 
> 
> Thanks,
> 
> Iddo
> 
> 
> --
> 
> Iddo Friedberg                                  | Tel: +972-2-6757374
> Dept. of Molecular Genetics and Biotechnology   | Fax: +972-2-6757308
> The Hebrew University - Hadassah Medical School | email: idoerg at cc.huji.ac.il
> POB 12272, Jerusalem 91120                      |
> Israel                                          |
> http://bioinfo.md.huji.ac.il/marg/people-home/iddo/
> 
> 
> 
> 
> 

> import cPickle
> import UserDict
> import anydbm
> loads = cPickle.loads
> dumps = cPickle.dumps
> class dbmDict(UserDict.UserDict):
> 	def __init__(self,filename, flag='r'):
> 		self.data = anydbm.open(filename,flag)
> 	def __getitem__(self,key):
> 		return loads(self.data[dumps(key)])
> 	def __setitem__(self,key, value):
> 		self.data[dumps(key)] = dumps(value)
> 	def values(self):
> 		value_list = []
> 		for i in self.data.keys():
> 			value_list.append(loads(self.data[i]))
> 		return value_list
> 	def keys(self):
> 		key_list = []
> 		for i in self.data.keys():
> 			key_list.append(loads(i))
> 		return key_list




More information about the Biopython-dev mailing list