[Bioperl-l] First commit of Bio::Structure objects
ed
ed@compbio.berkeley.edu
Fri, 16 Nov 2001 14:17:21 -0800
I would like to check in on this subject. I have written two bioperl
parsers which I will submit soon which deal with structure information.
Specifically, these modules parse STRIDE and DSSP output. STRIDE and
DSSP take PDB structure files as input and define secondary structural
elements within them (and do some other neat things). It's not clear to
me where these modules would go within the bioperl class hierarchy.
Currently I just have them inherit from Bio::Root. They _would_ be
Bio::AnalysisResult objects (implementing Bio::SeqAnalysisParserI)
except that they don't take Sequence objects as input -- they take
structures (pdb files). Therefore, the parallel hierarchy Kris has
proposed (Bio::Seq / Bio::Structure) seems like a good and necessary idea.
An issue that should be considered is the relationship between
Bio::Structure and Bio::Seq objects. Naturally, it would be useful to
extract sequence information from Bio::Structure objects, but that can
be very difficult to do correctly, esp. when dealing with PDB files.
It would also be useful, I think, once Bio::Structure is settled, to
implement parallel Bio::Factory:: modules
(Bio::Factory::StructureAnalysisParserFactory) for structural data
programs like STRIDE, DSSP, and others. Currently my modules are
incapable of setting up Factory objects or generating the output in any
way. They simply parse the results.
Glad to see bioperl is taking on structure and I look forward to helping
out.
Regards,
Ed Green
*****************************
Ed Green
Brenner Research Group
Univ. California / Berkeley
510-642-9614
ed@compbio.berkeley.edu
http:://compbio.berkeley.edu
*****************************
Chris Mungall wrote:
>
>On Fri, 16 Nov 2001, Kris Boulez wrote:
>
>>Quoting Chris Mungall (cjm@fruitfly.bdgp.berkeley.edu):
>>
>>>Hi Kris
>>>
>>>The object model design looks very sound. However, I noticed that there
>>>are cycles in the object graph (eg bidirectional links between Atom and
>>>Residue).
>>>
>>I see your point :(
>>So it shows this is my first real set of objects I design.
>>
>
>Well, your objects seem conceptually perfect to me, you've just hit a
>nasty operational consideration.
>
>>>This causes problems for perl, as the garbage collector gets confused
>>>about reference counts and won't clean up properly. (IMHO this is part of
>>>a larger problem with object oriented design as a whole, as attributes
>>>aren't first class entities in their own right)
>>>
>>>This won't be a problem for reading in a few PDB records, but, say,
>>>cycling through all of PDB will cause your memory usage to go up and up. I
>>>learned this lesson the hard way with my first perl object model, after
>>>doing java object models, it was a painful business going back and
>>>refactoring the code :@(
>>>
>>>One way around this is to keep your own reference counts override the
>>>DESTROY method to make sure everything is cleared - this can be tricky.
>>>
>>If the assumption is, that if an object is destroyed all its children
>>(everything underneath it) are destroyed, this should be doable.
>>
>
>unfortunately it's a bit trickier than that; it's a while since I thought
>about all this (the last time I did it made my head hurt), but I'm pretty
>sure that you have to also force the client code to explicitly free the
>objects once they are no longer in use, a la C coding. This is a bit of a
>burden for the potential users of your objects.
>
>If you force the users of your objects to go through a Factory for
>obtaining objects, this can be mitigated.
>
>Maybe perl6 will sort this out?
>
>
>>>Another way would be to have everything go through a singleton
>>>ProteinData object which would hold all parent/child relationships. The
>>>Residue/Atom object would be unaware of their reciprocal links, the client
>>>code would have to ask the ProteinData object for this.
>>>
>>>I notice you haven't checked in your StructureI object yet, so I can't run
>>>the tests - I may be missing something
>>>
>>The StructureI object hasn't been checked in, as it doesn't exist yet.
>>This was one of my questions in my mail: should I go for one StructureI
>>or for seperate EntryI, ChainI,... files. Also: should every public
>>method have an Interface definition ?
>>
>
>I'll leave this one to the other bioperlers - I'd say just one StructureI
>for now, if in the future there are other implementations of these objects
>it shouldnt be too hard to add interfaces in then.
>
>>I ran the tests as follows
>>
>> % cd bioperl-live/t
>> % perl -I.. ./Structure.t
>>
>>Kris,
>>--
>>Kris Boulez Tel: +32-9-241.11.00
>>AlgoNomics NV Fax: +32-9-241.11.02
>>Technologiepark 4 email: kris.boulez@algonomics.com
>>B 9052 Zwijnaarde http://www.algonomics.com/
>>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l@bioperl.org
>http://bioperl.org/mailman/listinfo/bioperl-l
>
--