[BioRuby] [GSoC][NeXML and RDF API] Code Review.

Anurag Priyam anurag08priyam at gmail.com
Sun Jul 11 06:51:03 UTC 2010

> This would be a factory, right?
> I think what you want to do is good in its objective - trying to
> shorten the implementation. But do you really need
> class inspection/reflection here? Asking for the class name is usually
> prevented by having proper attributes in the classes. That is if you
> use OOP. Question is whether you really require this.
> The problem with the code I had (and have) was the really wide and
> deep use of OOP classes. That led to duplication of code and little
> 'feel' for correctness of what is in there. Deep OOP hierarchies are
> evil. Duplication is ugly.
> Inspection/reflection is evil too - like Naohisa reacted, pretty much
> - it is only used in exceptional cases when there is no other elegant
> way of resolving issues. It can be powerful, but only use when really
> required, as other people often fail to understand what it does - and
> code should be self-documenting.
> I think you need to ask more fundamental questions to yourself.
> Why not use BioRuby basic types for most data represented by NeXML?
> Only use special objects when there is real added value. So DnaSeqRow
> would simply be a Sequence (or even list of char) and DnaSeqMatrix
> would be a list of Sequence. If you have further attributes create a
> new composite object (like SequenceFeatures, or if you think more
> functionally again a tuple of sequence(s) and features?).  This way
> you don't create a hierarchy that booms into hundreds of specialized
> object we won't use elsewhere. To differentiate between a DnaSequence
> and RnaSequence you do not need different objects. Both are strings
> (in BioRuby). You could even settle for Ruby's primitive types and
> containers.
> Likewise, even if you need a Matrix, you don't need RnaMatrix and
> DnaMatrix. I am sure of that. They are only specializations in name,
> the code in there should be identical.
> If you go down the OOP route, make use of Ruby's mixin's. Search
> Google for "ruby mixin deep oop hierarchy".
> My recommendation is to refactor the library to use as primitive a
> type as possible, at every point. When you run into functionality that
> requires a more complex type, because there is no other way - that is
> the moment to design and add it.

Point noted :).

> I don't know the full depth of the NeXML format, but I can predict
> it consists of primitive types in ordered ways. This can be mirrored
> by the implementation. If you do it like this you won't have to use
> inspection (like above question). OOP classes are for harnessing
> special functionality that go with a certain type. Do not create a type
> unless you need something special.

The fact that I do not know anything about bio* and phylo* also leads
to some amount of confusion :P. Due to some rotten luck I was not able
to confer with Rutger. I will discuss with Rutger and refactor the
code keeping your suggestions in mind.

> You can propose changes to existing BioRuby types - in particular
> with the RDF implementation.
> I know some people will balk at this rewrite - but to be honest, if
> you want your library to be useful to others it needs rethinking. I
> would take a week out of your plan to experiment with different object
> models - just start with a small subset. When you think something
> works, roll it out all the way. That can be done quickly. Read, read,
> read on the Internet about object models.
> One thing you can consider is to use an intermediate object structure
> for parsing the XML into Ruby - and next fork it out into logical
> data structures. I do that regularly as the XML 'model' does not
> normally map to Ruby well. One example of mine is here
>   http://github.com/pjotrp/swig2doc/blob/master/lib/input/doxyxmlparser.rb
> Doxy objects are stored in
>   http://github.com/pjotrp/swig2doc/blob/master/lib/cobj/doxy/doxycobjs.rb
> Note swig2doc also contains a convenience class for using libxml2 in
>   http://github.com/pjotrp/swig2doc/blob/master/lib/input/xmleasyreader.rb
> And while you are at refactoring, why not make sure the parser does
> not fill memory.
> Pj.
> PS. Are you using another NeXML OOP implementation as a model - Perl,
> Python, Java? I would like to know, so I can have a look.

Not using as a model but I sometimes refer to the python
implementation :- http://nexml.org/nexml/python/

Anurag Priyam,
2nd Year Undergraduate,
Department of Mechanical Engineering,
IIT Kharagpur.

More information about the BioRuby mailing list