[BioRuby] Bioruby HTML output

Wed Jan 13 06:57:11 UTC 2010

Hi, Happy New Year!

> For security, I'd like to ask security experts.
> Anyone in this list?

Though I am not an expert, in a Japanese blog,
http://takagi-hiromitsu.jp/diary/20051227.html
Hiromitsu Takagi writes the reason why escaping should be default at  
the output point,
from a security points, which sounds me reasonable, though I do not  
know an english
literature.

In addition,

> * Different escaping rules are needed for different output types,
>   e.g. SQL, html, XML, TeX, GNUPLOT, PostScript, R scripts.
>   Escaping by output methods seems natural, and helps to switch
>   output formats without concerning escaping issues specific
>   to each output format.

this is a good argument.
If html tag containing title is necessary, a non-default API that  
does accept
html marked text rather than the normal text should be considered.
-- 
Tomoaki NISHIYAMA

Advanced Science Research Center,
Kanazawa University,
13-1 Takara-machi,
Kanazawa, 920-0934, Japan

On 2010/01/13, at 11:58, Naohisa GOTO wrote:

> Hi,
>
> On Tue, 12 Jan 2010 11:11:32 +0100
> Pjotr Prins <pjotr.public14 at thebird.nl> wrote:
>
>> On Tue, Jan 12, 2010 at 06:29:57PM +0900, Naohisa GOTO wrote:
>>> I'm not sure whether the prefix Bio::Html is suitable or not.
>>
>> Me neither ;). This is something to discuss when we meet. See my
>> write up on partitioning based on functionality or standards.
>>
>>> By the way, I'v tried some of your code in
>>> http://github.com/pjotrp/bioruby/blob/color-alignment/
>>> and found potential XSS.
>>>
>>>   a = Bio::Alignment.new
>>>   a.add_seq('ATCCATGG', '<script>alert("a");</script>')
>>>   a.add_seq('ATGCATGC', '<script>alert("b");</script>')
>>>   a.add_seq('<script>alert("c");</script>', 'c')
>>>   simple = Bio::Html::HtmlAlignment.new(a,
>>>           :title => '<script>alert("title");</script>')
>>>   html = simple.html()
>>>   File.open('/tmp/xss.html', 'w') { |w| w.print html }
>>>
>>> For sequences, sequence names, and consensus lines,
>>> using CGI.escapeHTML() will always be needed.
>>>
>>> For the :title, if script users can set the title, it
>>> should be escaped, but this prevents script programmers
>>> using html tags in the title.
>>
>> Perhaps the HTML generator should escape its output. Though I
>> personally think we should only be worried about security concerns
>> when people *enter* new data on input forms. That is when exploits
>> show up. I can argue that HTML generation should not concern itself
>> with HOW the inputs are presented. One advantage of having a
>> programmer set the 'title' is that he *can* embed HTML. Perhaps
>> escaping HTML is the responsibility of the programmer providing the
>> data. And therefore to the logic that handles input.
>
> Even apart from security, sequence names (and sequences) that
> contain html special characters may not be correctly displayed.
>
> For example, sequences with three parameters a, b, and c.
>
> % cat test.aln
> CLUSTAL 2.0.9 multiple sequence alignment
>
>
> 1<a<3_b>5_c<7       FKNVFTVIKTEFNSHRVKIDSHFHGIWIAGAPPEGTDVYIKWFLQ
> a>3_5<b<8_c>11      FKNVMSVIKTEFNSHRVKIDSHFHGIWIAGAPPEGTDVYIKTFLQ
>                     ****::*********************************** ***
> % irb -r bio
> irb> report = Bio::ClustalW::Report.new(File.read('test.aln'))
> irb> alignment = report.alignment
> irb> simple = Bio::Html::HtmlAlignment.new(alignment, :title =>  
> 'a,b,c')
> irb> File.open('abc.html', 'w') { |w| w.print simple.html() }
>
> The sequence names were correctly treated by ClustalW 2.0.9,
> but unexpected representation.
>
> This problem can not be solved with input data escaping.
> If the sequence name "1<a<3_b>5_c<7" is escaped to
> "1&lt;a&lt;3_b&gt;5_c&lt;7" before calling the method,
> text indentation will be broken because of the mismatch of
> text length and html display width. To solve this, to
> escape when building the html format by output formatting
> method will be needed.
>
>> We have had a similar discussion before. We have to decide to what
>> level *output* code should concern itself with *input* security. I
>> have a feeling that too much of Bioruby classes try to do too much.
>> How do we stay away from cluttering the code? How do we decide that
>> callers should not use HTML and handle security concerns?
>
> It is difficult not to use HTML-like string which we want
> to be treated as normal unformatted string but unexpectedly
> treated as HTML by some programs, e.g. the above example.
>
> For security, I'd like to ask security experts.
> Anyone in this list?
>
> I think escaping should be done by formatting layer and
> should be turned on by default, because:
> * Only the output formatting layer knows how the input data
>   is processed.
> * In many cases, the data comes from outside, and we can not
>   expect it is safe enough.
> * Different escaping rules are needed for different output types,
>   e.g. SQL, html, XML, TeX, GNUPLOT, PostScript, R scripts.
>   Escaping by output methods seems natural, and helps to switch
>   output formats without concerning escaping issues specific
>   to each output format.
>
>> You write:
>>
>>>   a.add_seq('ATCCATGG', '<script>alert("a");</script>')
>>
>> If a programmer wants that - it is his concern in my opion. If he is
>> concerned about exploits he should not allow it. The Alignment class
>> does not care either. It is none of its business.
>
> The example is extreme case. For security, please ask experts.
> Apart from the security, I wish ">", "<", "&", etc. can be
> displayed correctly. I think methods to build HTML format
> should concern this.
>
>> BTW I fixed a number of PAML::Codeml bugs on this branch. So you
>> can ignore the existing PAML branch. Let's continue with the color
>> coding, assuming you can live with the PAML::Codeml implementation,
>> as it stands.
>
> When do you want the Bio::PAML::Codeml code to be merged to the
> blessed bioruby repository?
>
>
> Naohisa Goto
> ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org
> _______________________________________________
> BioRuby Project - http://www.bioruby.org/
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby
>