[Biopython-dev] Module reorganization for upcoming Bio.PDB enhancements

João Rodrigues anaryin at gmail.com
Tue Jun 8 17:10:48 UTC 2010


Hello all,

I'm replying here to what Thomas wrote on the GSOC Report thread because it
seems a better place.

PDB files can contain anything RNA, DNA, sugars, small molecules... It is
> thus not a good idea to
> directly associate protein-specific methods to the structure class; it will
> lead to a bloated Structure class and a lot of irrelevant methods (ie.
> search_ss_bonds is meaningless for a PDB file that contains RNA).


Agree.

Currently, one creates Polypeptide objects from a Structure object using a
> factory design pattern (via PPBuilder); the Polypeptide class implements
> some protein specific methods. I believe that is a much cleaner way to do it
> (though we need a Protein class that represents collections of connected
> polypeptides). One can also make sure that all such derived objects
> (Protein, NA, DNA,...) adhere to the same interface by providing a suitable
> base class with shared functionality - in that way, the whole thing is also
> extendible.
>

I think there has been already some discussion about this. My personal
opinion/suggestion is having a structure like:

Bio.PDB/
_______/Protein.py
_______/DNA.py
_______/RNA.py

that would translate to an usage of something like:

from Bio.PDB import Protein
structure = Protein('1ABC.pdb')
structure.search_ss_bonds()

but not

structure.calc_melting_temperature() (just an example)

Protein() would call PDBParser(). It could also include, to a certain
extent, an Alphabet-like feature to assure residue names are OK (this goes a
bit with this proposal<http://www.biopython.org/wiki/GSOC2010_Joao#Residue_name_normalisation>).
I believe this goes a bit into what you said. Having a class that basically
abstracts what we do now (Bio.PDB.PDBParser) and allows for
molecule-specific methods. However, it also leads to some problems:
Protein/DNA complexes come to mind.

How does this sound? I think it goes with what Eric said in the first post
of this thread and what Thomas replied in the GSOC thread. We should also
change the PDB name to Struct to better reflect the purpose of the module.
All of the other additions like Bio.Struct.WWW would still apply. And I
don't see a major problem in breaking the existing code by adding this.

João




More information about the Biopython-dev mailing list