Pise/EMBOSS 2.3.1

David Mathog mathog at mendel.bio.caltech.edu
Tue Apr 23 18:25:34 UTC 2002


> One problem ... inwild does not work well as a parameter because it
has to
> be given as "*" on the command line. Same problem for "outwild". I am
sure
> users can be educated.

Sure they can.  That's why thousands of hours are being spent wrapping
GUIs
around programs so that users don't have to (horrors) log on or (gasp)
type
a command line.

Back to the subject at hand. (And this is stream of consciousness, so
please
bear with me.) I think that maybe for purposes of interface design there
should be predefined methods to break out (all) the pieces/options of a
USA.   (Perhaps
even reduced to perl and C modules in the EMBOSS distribution so that
W2h/Pise/etc 
don't need to be rewritten for each EMBOSS release.) Consider something
like this:

 program -sequence=genbank:\*

That never translates directly well into a GUI because the end user has
to
know what the full USA syntax is and especially that a "*" is a wild
card.
And often enough, they don't understand these concepts.   And even if
they do,
they may not be able to use certain aspects of that syntax on a given
server (for
instance, files and paths, or particular databases.)   So it falls to
the
GUI to put some glue in between the USA and the user.  The two main web
interfaces for EMBOSS take opposite paths in this regard.   Pise hides
the USA  completely and W2H allows the user to manipulate USAs through a
tool.
In W2H you generally have to build the USAs
ahead of time through a separate window and store them in a list, then
you select one or more USAs from the list when you run the program. 
(USAs can also generally be typed  into the slots
within the program - if the user knows what he/she is doing.)
In PISE you can enter a database USA like "genbank:dmwhite" (but it
isn't
called a USA) but entering "genbank:*" doesn't work (for instance, with
compseq).  PISE isn't really designed to handle wild cards because
it's going to try to extract that whole sequence from the database and
save it in a file and then run the program on that file.  This is
consistent with
its typical "upload data for each program" design.  Pise only ever runs
programs with the "simple file" sort of USA.  So perhaps its just as
well
that "genbank:*" doesn't work at the moment!!!  To get around this
wildcard limitation Pise would have to be reworked enough to recognize
wildcards (and USAs in general) and slot them onto the command line
without first extracting the sequences they refer to.

Anyway,  what's really going on with -sequence is that all of the
components of  USA are encoded into a single string for use on the
command line and then are broken out again into separate pieces later
within the program.   For a GUI _all_ these pieces need to be broken
out explicitly and displayed to the user (who isn't expected to know
anything about USAs or have to learn anything them or the interface). 
Something like this:

format: default
database:genbank
x ALL_ENTRIES  o BY_STRING
entrystring: (blank)

>From that the GUI/cgi can easily enough format a USA for the final
command line.

But imagine using such an interface.  It's great if you just run an
occasional program
but not so wonderful when you're doing something complex.  How do you
cut and paste
the state of 4 (or more) USA variables from one page (=program) to
another?  That suggests to me that a GUI which always has fully broken
out USA options will probably
end up being pretty awkward to use.   However, since the purpose of the
GUI is
to essentially reformat (implicit) information in the USA why not make
that an
explicit option - and let it reformat in both directions?  Then the
"standard" USA
GUI interface starts to look something like this:

[test usa] [from USA] [to USA] [use this] [abort]   <------(buttons)
USA:[  genbank:*   ]
format: default
database:genbank      <-------- (pull down list)
x ALL_ENTRIES  o BY_STRING
entrystring: (blank)

Actually it's a LOT more complicated than that, considering that it also
encompasses
listfiles, multiple entries (foo.msf{one,two, three}) etc..  If the user
has a USA he/she can
plug it into the GUI and fine.  Or they can plug it, translate it, and
tweak it.  Or if
they don't have a USA to start with they can use this page to build one.
And this USA constructor page can enable/disable the USA fields as
appropriate
for each site and/or program.  (No file access?  Can't accept list files
or wild cards?
Then don't show those USA options.  Make the database list from the
output of showdb.)

The final problem is that exposing the guts of the USA will take up a
lot of
screen space and complicate the program interfaces.  That's less
of a problem though if the GUI for any given EMBOSS program just
provides
a slot to plug in a USA and some way to pop up the USA fomatter window
to fill in that slot (through javascript or whatever).  The popped up
formatter
could then drop the final USA back into the program's USA slot.  (Sort
of like
what W2H does, but into the programs slot rather than the working list).

Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech




More information about the EMBOSS mailing list