[BioRuby] Bioruby HTML output

Sun Jan 17 13:54:41 UTC 2010

Hi Tomoaki,

Thanks for you responses. I really appreciate it.

On Sun, Jan 17, 2010 at 02:12:35PM +0900, Tomoaki NISHIYAMA wrote:
> A user can write:
>
> class HTMLString < String
>   def to_html
>     self
>   end
> end
>
> a = Bio::Alignment.new
> a.add_seq('ATCCATGG', HTMLString.new('<a href="http://example.com/ 
> path/to/original/seqinfo"><em>a</em></a>'))

There is at least one 'problem' with this approach.

This assumes that Bio::Alignment will keep its current implementation.
Currently Bio::Alignment stores a list of descriptions, and a list of
sequences. As Naohisa wrote me two weeks ago, this is before
Bio::Sequence had its own identifier/descriptor. If we redesign
Bio::Alignment there is a large chance we will store Bio::Sequence
instead of two lists (I, for one, would certainly favour that).

The other problem is more about OOP. In your example you say once it
is an HTML object (HTMLString) and next you add a specific method for
html 'to_html'. Twice it is 'told' that it generates HTML. 'to_html'
also implies something of a transformation. We should opt for a
different method name (generate_html, perhaps, or html)

class HTMLString
  def html
  end
end

The 'responsibility' of the output is with HTMLString. Good. This way an
implementation of Bio::Alignment does not need to know about HTML,
but still can generate the output, at the user's request.

> # this is html under the responsibility of the programmer
>
> a.add_seq('ATGCATGC', '<b>')
> # this is not html; don't care on '<', or '>'
>
> simple = Bio::Html::HtmlAlignment.new(a,
>   :title => HTMLString.new('A <em>fancy</em> <b>HTML</b> <i>title</i>'))
> html = simple.html()
>
> If Bio::Alignment does not force the object given to be String,
> such code should be possible without the change in Bio::Alignment,
> and only the HtmlAlignment class and the programmer needs to know it.
> So, HTML specific code does not need go to regular BioRuby code.

HTMLAlignment should not care either how the HTML is generated.. It is
really up to the container holding the sequence, or description, what
the output is.

What I don't like about proposed approach is that HTMLAlignment gets
an object, needs to check for an 'to_html or html' method (ugly), and
if it does not exist, needs to escape the information (by calling the
to_s method?). That is a lot of formal checking I need to do for
every output generated.

>> That would be the proper way to handle it. No testing of methods
>> (like to_html), but use the object structure to define what is
>> supported (and not).
>
> I'm not sure what do you mean by "use the object structure".
> How do you distinguish a plain text and HTML text?

The output is generated by an HTML aware container. We can agree to
use one method 'html' method.

Create different types of objects:

  HTMLSequence.html - generates formatted HTML
  ColorHTMLSequence.html - generates formatted color HTML
  EscapedHTMLSequence.html - generated escaped native stuff

And if someone wanted it, he could create:

  Sequence.html  - generates plain text

This would prevent downstream 'checking' of object responsibilities.
We can assume the user knows he is going to use HTMLAlignment and
therefore we can expect him to pass in a known HTML supported
Sequence object.

The reason to get the responsibility in the right place is to create
as clean as possible code. You really don't want downstream checking
of methods.

We can further discuss in Japan. At least it is clear we have several
options.

Pj.

> -- 
> Tomoaki NISHIYAMA
>
> Advanced Science Research Center,
> Kanazawa University,
> 13-1 Takara-machi,
> Kanazawa, 920-0934, Japan
>
>
> On 2010/01/16, at 17:30, Pjotr Prins wrote:
>
>> On Sat, Jan 16, 2010 at 02:36:02PM +0900, Tomoaki NISHIYAMA wrote:
>>>> I am going to add a 'master' switch for escaping of HTML. The  
>>>> default
>>>> will be with escaping.
>>>
>>> How do you think to test if the object responds to to_html
>>> and then call to_html else pass to escapeHTML.
>>
>> In this case the object to convert to HTML is a String and part of
>> Bio::Alignment. Later implementations of Bio::Alignment could use a
>> Bio::Sequence.id (or something Naohisa wrote me).  It would mean we
>> would have to create a Bio::Sequence::Descriptor object, which would
>> contain several specialistic 'output' generators.
>>
>> This is a recurrent idea we need to discuss.
>>
>> I think *all* HTML based stuff should be in its own objects - and its
>> own tree (I have created bio/output/html for that purpose).
>>
>> I think it is a bad idea to clutter regular BioRuby code with HTML
>> specific stuff. Likewise for other outputs, as you pointed out, like
>> plotting. Output should live in
>>
>>   bio/lib/output/html
>>   bio/lib/output/plot
>>   bio/lib/output/gtk
>>   bio/lib/output/rails (perhaps)
>>   (etc)
>>
>> that way display code never pollutes the simple Bio::Sequence object,
>> for example. You'll get Bio::Html::Sequence for that - or my
>> preferred naming Bio::HtmlSequence.
>>
>> Now if Bio::HtmlSequence could be plugged into Bio::Alignment - the
>> latter would not care - and we could adapt the HtmlSequence info to
>> show embedded hrefs.
>>
>> That would be the proper way to handle it. No testing of methods
>> (like to_html), but use the object structure to define what is
>> supported (and not).
>>
>> Until we implement that (get Bio::Alignment to support arbitrary
>> Sequence objects) I think the master switch is fine. I have updated
>> my branch. Default behaviour is escaping. If a user (like me) wants
>> it otherwise, it is allowed.
>>
>> Pj.
>>
>
> _______________________________________________
> BioRuby Project - http://www.bioruby.org/
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby