[BioRuby] Bioruby HTML output

Toshiaki Katayama ktym at hgc.jp
Wed Jan 20 00:49:37 UTC 2010


Dear Pj,

On 2010/01/20, at 5:52, Pjotr Prins wrote:

> Dear Toshiaki,
> 
> On Wed, Jan 20, 2010 at 01:21:54AM +0900, Toshiaki Katayama wrote:
>>> I agree with Tomoaki it is too restrictive. What, indeed, if we want
>>> to present the HTML in a different way?
>> 
>> Hmm. Could you provide me some use cases?
> 
> Think of URL's. One user wants to point a gene ID to NCBI. Another
> to Swissprot. The container can not be aware of all exceptions - and
> really should not handle it.

Still not clear to me.

I supposed to generate a URL string for the href attribute of <a>.
However, is there any IDs which needs to be escaped?
Or do you mean to embed a HTML snippet in URL?
If so, we may need to use URL encoding (URI.escape) 
instead of the HTML escaping (CGI.escapeHTML).


> 
>> Override the output_html method, or, use some template engine to be
>> more generic.
> 
> Maybe those are good mechanisms. In the pre-hackathon we should
> discuss these points.

Is there any better replacement for Ruby's CGI library available?

Requirements:

- separation of the HTML from CGI

CGI.escapeHTML looks ugly in terms of the naming convention (CamelCase)
and the name space -- why not HTML.escape(string). Moreover, we don't
want to require 'cgi' just for escaping a HTML string.

- support for templates (separation of logic and presentation)

I had used erb and html-template. Sometimes erb is too slow (especially
when it contains a nested loop to generate a number of lists or tables).

- bundled with Ruby as a standard library

Otherwise, we'd better to use Rails as a default environment
(from a viewpoint of popularity).


> 
>> I can agree some files became too large to learn and/or maintain.
>> But if we try to change the structure of current code base,
>> we need to define a clean criteria beforehand.
> 
> Yes.
> 
>> If we separate files into sub files, people then need to look around
>> the number of files, and it may also slow down the loading speed of
>> the bioruby library. It is a problem of balance.
>> 
>> In both cases, lack of excellent guide to read through the bioruby
>> library might be a essential issue.
> 
> I think if we structure the files and modules well - and make them
> small enough - they become self-explaining. That would be my ultimate
> goal.
> 
>> At some time, we may do refactoring to produce BioRuby 2.0.
>> Before doing that, we can discuss how to sit all classes/codes cleanly.
>> We may need someone who understand entire structure/contents of
>> the current codebase and willing to design a better one with a good sense.
> 
> Yes. I agree it is a big step. But we should go for this type of
> challenge.
> 
>>> Don't you think the Sequence, or KEGG, object should not care about
>>> HTML? Or RDF, or plotting? Those are separate functionalities. They
>>> share common access patterns - which are part of the DB class.
>> 
>> Again, we can take both approach. My current proposal is conservative one.
>> Just add these functionalities in each class as the class knows what is in it
>> and what is the best way to represent the contents.
>> 
>> If we separate formatting/plotting functionalities into separate class,
>> which might be something like Bio::FlatFile class who knows the header
>> line format of every database entries. Or we may design better one.
> 
> FlatFile has some downsides. It has complicated the libraries.
> Complication means the modules are less easy to adapt/modify. I think
> it is slightly over-engineered. Maybe not enough of a problem to take
> it out, but I hope you see where I am coming from.
> 
>> Anyway, I'm now listening. So, please don't stick with HTML things only
>> and think a global design to which we can plan to migrate.
> 
> I have to spend a day on a writeup. In the coming two weeks. I will
> try to explain my ideas.


OK, let's discuss about these topics as well, during the pre-hackathon
meeting (7th Feb) in Tokyo with other core developers.



> 
>> Maybe from esthetics viewpoint?
>> 
>> I think it looks better, and, we can easily switch the output format
>> depending on the context without modifying the code.
>> Something like a @media property in CSS (screen, print etc.) in mind.
>> 
>> if used_for_semantic_web?
>>  format = :rdf
>>  # add some codes to do preparation job for SW
>> elsif used_for_blast?
>>  format = :fasta
>>  # add some codes to do preparation job for blast
>> end
>> 
>> # we don't need to change the following line in any context
>> entry.output(format)
> 
> I see your point. The criticism is that it obfuscates the real
> intention of the code - i.e. it is not self documenting any longer.
> But, I guess, this boils down to preferences and acquired tastes. It
> is not obvious to a newbie, though it may be obvious for someone who
> is accustomed to Bioruby internals. Which may be good - depending on
> our basic values.
> 
> Pj.


Note that, you can still directly use the output_html method in each
database class. The output(format) method is prepared just as an abstract
interface, which will be useful in the above situation, for example.

Therefore, following both cases should return the same result and
you can choose the coding style depending on the situation.

# case 1
format = :rdf
entry.output(format)

# case 2
entry.output_rdf

You can also check entry.respond_to?(:output_rdf) in both cases.

Toshiaki














More information about the BioRuby mailing list