EMBOSS 1.12.0

David Martin dmartin at gen67172.msiwtb.dundee.ac.uk
Thu Apr 19 07:55:34 UTC 2001

On Wed, 18 Apr 2001, Peter Rice wrote:

> David Martin wrote:
> > I would take all options specified in the ACD file plus any specific
> > associated parameters so that essentially everything is stored.
> Very messy for applications with a long list of (possibly hidden) options.
> Could maybe cover all the required options (the ones that get prompted
> for). I would prefer just the ones that were set (i.e. enough to build a
> command line to repeat the run)

But what if the defaults have been modified in the ACD? It is about
repeatability of experiments. Of course one could have an option to have
either a minimal header or (as is now) no header at all.

I think the current situation is poor scientific practice (ie the data
isn't self documenting.. We were always taught to label and date
everything, photos, spec traces, column traces, tubes in racks etc. so you
_know_ what it is because it knows what it is.)

Keeping just the command line and prompted options would be a minimal
header (along with program name and date.)

For full record keeping a full record should be kept. This can easily be
parsed out by any vaguely competent unix hack and allows for proper

> > XML output as a standard option for all programs would be a very nice
> > thing. It requires some reworking of how results are handled though to
> > actually think 'object' rather than 'essay'.
> Do you have some standard XML in mind? Whenever I look into this I see a
> forest of DTDs and no clear standard. Of course, we could invent one...

Most of the DTD's out there are crap. I would go with one of our own to
start with. If designed well this can easily be transformed into any
format desired (HTML etc.) thus removing from the application writer the
need to cope with every possible format. A post-process transformation can
allow the user to specify an XSL stylesheet of choice (or DSSSL if that
route is chosen).

If EMBOSS follows the following scheme:

ACD parse
Do the Science
Generate output in XML (full record)
Transform to the users desired form

Then you can have the 'classical' form of output ie plain text,
HTML, short XML output, full (untransformed) output, SQL, MS Word (only
kidding), graphical output in whatever form (How about moving EMBOSS
graphics from PL_PLOT to SVG?)

It adds a layer of output independence to EMBOSS, provides a standard
interface for external apps to pick up emboss output (think of all the fun
you are having in dealing with unstandardised output models for the
various programs and the joy of James Bonfield getting EMBOSS into SPIN.)

This is not really very different to what happens with sequences which
have their own internal representation and are transformed to the required
format on output.


More information about the emboss-dev mailing list