[Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs

Wed Jan 7 14:33:21 UTC 2009

http://bugzilla.open-bio.org/show_bug.cgi?id=2711

------- Comment #25 from lpritc at scri.sari.ac.uk  2009-01-07 09:33 EST -------
(In reply to comment #17)

> 1) I do not understand the need for the dictionary of modules 'formatdict' in
> _write as it creates unnecessary inefficient code. The options need to be part
> of the check for the type of output.

The need is that input types are associated with alternative rendering
backends.  The distribution dictionary approach is highly-readable and readily
extendable to accept, for example, lowercase variants of format names that map
to the same backend - as in your point number 2.

I also don't understand your efficiency argument.  Firstly, this step is not
AFAIAA a bottleneck, and hardly a priority for optimisation; secondly I do not
believe that a distribution dictionary is less efficient than your suggestion. 
The dictionary achieves the same end in three lines of code, rather than ten
for the elif.  Also computationally, if the format name is 'TIF', your elif
code will always have to cycle through all output format name tests (four
conditionals, and an O(n) list search) in order to associate that format with
renderPM.  This is less efficient than a dictionary approach: retrieving values
from dictionaries takes approximately constant time. Not that if we ran profile
on the two approaches we'd see much of a difference, of course - this is not a
speed-critical step.

Also, and in my opinion, elifs are not as easy to maintain, or as readable, as
distribution dictionaries.

> 2) There is no indication that the output for write and write_to_string only
> accepts uppercase. Note the _write function states this but a user will not see
> these. I do not understand why lowercase is unacceptable. 

It's not unacceptable - at least, not to me - I just didn't write it to accept
lowercase, originally.  I've no objection to adding lowercase variants of the
format names to the distribution dictionary.

> 3) The check for renderPM at start is really redundant because _write checks
> for it (well sort of). It is also an unnecessary delay if renderPM is not used.

It's not a big speed hit (or is there contradictory data? it's certainly not a
speed worry for my work) and, if tested on import, needs only to be done once
when GenomeDiagram is imported.

> 4) There is no test for the presence of renderPM. The test function must check
> for renderPM and should at least provide a warning if not present. Otherwise
> this is a surprise to a user because not all options will be available.

Raising an error, or at least a warning, is a good idea.  I favour raising this
error on first import.

> 5) The installation documentation must also indicate that renderPM is optional
> and also how to install the renderPM module.

I'm still not convinced that this is all that big an issue: renderPM is part of
the source ReportLab 2.2 distribution, and the instructions on reportlab.org
are pretty clear.  However, for those users who have pathological
installations, a line pointing out that renderPM can be obtained via
reportlab.org is a good idea.

-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.