[Biopython-dev] Properties names in command line wrappers

Tue May 5 13:58:04 UTC 2009

On Tue, May 5, 2009 at 1:36 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>
> Could we support both the original argument and optional human
> readable arguments? I know the code in Application is a bit
> hard coded for the first argument as the real name and the last
> argument as the readable name; the cleanest solution would be to
> generalize this to have multiple names where it makes sense.

You mean for these BLAST examples, create two properties "a" and
"nprocessors", both controlling the "-a" parameter, and also two
properties "A" and "window_size" both controlling "-A"?  From a code
point of view, this would be moderately straight forward - but I'm not
convinced about this.

> More practically, it always makes sense to have the low level
> standard arguments from the program itself. Even if it is
> non-intuitive like BLASTs switches, people who already understand
> the program can just use their existing knowledge without any
> specific knowledge of how Biopython.

Yes :)

Personally I initially found it very frustrating when using the
Bio.Blast.NCBIStandalone.blastall wrapper because the NCBI switches
had all been given friendly names, and it wasn't clear without looking
at the source code what mapped to what.  As a minor change, I think
the Bio.Blast.NCBIStandalone.blastall docstring should actually
include the real NCBI switch used by each Biopython keyword.

> Where someone wants to support more useful names, they can
> add those in.

So that we cater to those familiar with the NCBI command line
arguments, but also give a more human alternative?  On the downside,
it means there are two ways to set these parameters.  Also, if we go
down this route for consistency for all command line wrappers we may
want to invent more human readable aliases (if the tool arguments are
too cryptic).  We are also opening up a potential problem if the tool
later adds a new argument whose name clashes with one of our
inventions.  Also would we care about the lack of consistency between
tools (e.g. infile versus input?), and should we try and be consistent
in our new names?

I favour using only a single property for each parameter, with the
name as similar as possible to the actual command line switch (i.e.
property name "a" for "-a", not "nprocessors").  Note each property
would have a docstring which will say what is it for ("Number of
processors to use.").

In the case of the existing blastall wrapper in
Bio.Blast.Applications, I would use change names=["-a", "nprocessors"]
to ["-a", "nprocessors", "a"], meaning "a" (last entry) would be the
property name used, "-a" (first entry) would be used for the actual
command line string.  I would keep the "nprocessors" alias for
backwards compatibility only - all three aliases would be available to
the (legacy) method set_parameter.

> You have been digging around in this so probably have a good idea
> how hard this is to implement practically. If it's a pain, I'd argue
> to just have the original arguments now, and the useful names can do
> on a todo list.

It is certainly possible, although probably a bit tedious due to
changing the "boilerplate" code.

Peter