[BioRuby] Codeml parser

Naohisa GOTO ngoto at gen-info.osaka-u.ac.jp
Mon Jan 4 07:15:18 UTC 2010


Hi,

I also think the current Bio::PAML::Codeml::Report is needed to be
rewritten. It is great if you do so. Here is my comments.

>  codeml = Bio::PAML::Codeml.new(nil, :runmode => 0, :RateAncestor => 1,
>                                      :alpha => 0.5, :fix_alpha => 0)
>  report = codeml.query(alignment, tree)
>
> which, as it happens, works. The 'nil' points to the program executable.
> 'nil' merely fills in 'codeml'. It would have been beter to make it one
> of the listed options, e.g. :binary => 'codeml'. That would save the ugly
> 'nil' parameter and belongs more to the principle of least surprise, that
> makes Ruby shine.

It is safe not to merge bioruby internal options and PAML's options.
If the upstream authors of PAML introduced a new option named binary,
severe problem would occur.

One way is to write a code that acts something like C++ polymorphism.
For example, the code below accepts the three cases.
* Bio::PAML::Codeml.new("/path/to/codeml")
* Bio::PAML::Codeml.new({ :xxx => yyy, :ppp => qqq })
* Bio::PAML::Codeml.new("/path/to/codeml", { :xxx => yyy, :ppp => qqq })

  def initialize(*argv)
    program = nil
    params = {}
    case argv.size
    when 0, 1
      begin
        params = argv[0].to_hash
      rescue NoMethodError
        program = argv[0]
      end
    when 2
      program, params = *argv
    else
      raise ArgumentError, "wrong number of arguments (#{argv.size} for 2)"
    end
    # continues to the current code...

The bad points are:
* Complexity of code is increased.
* It might make difficult to refactor codes, especially when keyword
   arguments are introduced in the future version of Ruby.

Note that Ruby's author Matz has said that he had not applied the
principle of least surprise to the design of Ruby.
(http://en.wikipedia.org/wiki/Ruby_(programming_language)#Philosophy )
Please be careful that the word "principle of least surprise (POLS)"
is NG word when you request something in Ruby.
(http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/26942 )

>  A new implementation of Bio::PAML::Codeml::Report

> So I propose to rewrite the class supporting for multiple models,
> with the following usage (starting from a codeml report - really result):
>
> >> report.models.size
> => 2
> >> report.models[0].name
> => "M0"

I suppose report.models returns a Hash containing objects of newly written
class (for example, Bio::PAML::Codeml::Report::Model) or Struct.
It seems good.

Existing methods could be changed to return the first model's values.

> Unit tests

Currently, tests with external dependencies (e.g. web services) are
located in the test/functional/ directory. So, your tests running
codeml would be named test/functional/bio/appl/paml/test_codeml.rb,
test/functional/bio/appl/paml/codeml/test_report.rb, or something
like this.

> These tests, for example, can be run on a special switch:
>
>  runner.rb --test-dependencies

I'm now searching ways to pass such parameters to tests.
Note that tests can also be run in various ways. For example,
  ruby test/unit/bio/appl/paml/codeml/test_report.rb 
  testrb test/unit/bio/appl/paml/codeml
  rake test

> I am sure it works, but doesn't anyone think this belongs in a support
> module (e.g. BioTestFile) for testing? What I would like to see is
> something less brittle:
>
>  require 'bio/test'
>  str = BioTestFile::read('paml/codeml/output.txt')

I'd like to keep tests simple and clear, and I think using standard
File.read is enough and clearer. When using such special class, to know
the behavior of the test code, reading extra file is needed.

> Personally, I dislike the naming/name space scheme of Bioruby.
> What to think of invoking a class named
>
>  report = Bio::PAML::Codeml::Report.new

Because there are many bioinformatics software and databases, names
tends to be longer, and nesting of namespace tends to be deeper.
I'd like to know naming rules and policies of other open-bio projects.

> Why can't it just be
>
>  include Bio
>  report = Codeml.new

I think it is enough to write "include Bio::PAML" instead of (or in
addition to) "include Bio".

>  include Bio
>  result = Paml.new(:program => 'codeml')

I don't like introducing such new parameter like :program.
I think 1 class 1 binary is better.

In addition, because the differences within PAML tools (codeml, baseml,
yn00, etc.) are currently not small, merging the classes is not so
realistic now.

On Thu, 31 Dec 2009 15:15:46 +0100
Pjotr Prins <pjotr.public14 at thebird.nl> wrote:

> Hi Michael,
> 
> I have a writeup on improving the current PAML functionality. Are you
> OK with this?
> 
>   http://bioruby.open-bio.org/wiki/BIORUBY_PAML
> 
> (maybe it does not belong on the bioruby Wiki - but I think of it
> like a 'design' document).
> 
> Pj.
> _______________________________________________
> BioRuby Project - http://www.bioruby.org/
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby


Naohisa Goto
ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org



More information about the BioRuby mailing list