suggestion for ACD syntax

José R. Valverde jrvalverde at cnb.uam.es
Mon Feb 14 10:16:58 UTC 2005


This is long, but please, read to the end. I make various serious 
considerations and bring up important concerns.

On Sat, 12 Feb 2005 18:28:24 +0100
Guy Bottu <gbottu at ben.vub.ac.be> wrote:
> 	Dear developpers,
> 
> Yesterday I had a discussion with Marc Colet, developper of the EMBOSS 
> interface wEMBOSS and we have a suggestion for a small extension of the 
> ACD syntax. It would be nice if the parameter type "infile" had an 
> attribute "extension", so that the program would only accept input files 
> with a name ending with ... This would perhaps not be so useful at the 
> command line, but in a GUI this would allow for a selector with a filter 
> showing only the appropriate files.
>
That looks interesting. Problem I see is that there is no standard naming
followed by users. Most often they come from Mac/MSW environments where
this is automatically taken care of by the system transparently (sort of)
for them. This implies they are used to typing names with no extensions.
Furthermore, some packages encourage this behaviour (e.g. Phylip) or in
contradictory terms (e.g. ".fasta" for a FastA formatted file and for a 
FastA result listing).

Thus, either we adopt a standard naming convention, and force all GUIs to
adopt it and do so transparently (i.e. removing the extension from the name
on listings) or it may become a serious problem. Even so, data transport from
other packages may be a problem.

OTOH, if a GUI is to impose naming conventions and do so transparently to
the user, then it may as well offer the selection as a menu in the 'open'
box just like Netscape, MSW,  and many others do (i.e. under the filename
offer "Show only sequences (.seq, .aa, .nt, .gcg, .fasta...)"/"Show all
text files..."/"Show all files (*)"...

This raises a side issue that should be obvious: in the example given let's
try to follow it:

	"Show only sequences (.seq, .gcg, .fasta, .pir, .abi, .nrl3d, .embl,
.genbank, .swissprot, .sw, .ddbj, .and-so-on-and-on-for-a-long-very-long-
indeed-list-of-extensions...)

The problem stems from the fact that emboss automagically manages many kinds
of sequences. Then, if I state on my ACD "extension = .gcg" I am doomed
because I won't handle any longer all the formats. If I have to list them
all, I may forget some (or some may be added later).

For these reasons, should something be added, I would prefer to see "abstract
types or kinds" of possible infiles (sequence, codontable, text, image, 2D-pstruct,
3D-pstruct, na-struct, etc...) so that one could pick up *all* files of *any*
suitable extension that can be processed by a program.

This leads to a proposition: MIME-types or the like of. I.e. what would be
really helpful for GUIs is to be able to specify types 'a la' MIME: e.g. 
this is an "x-emb-sequence/x-fasta-nt", "x-emb-msa/x-phylip", "structure/pdb",
etc..

In this way, if one is willing to accept any EMBOSS-know sequence format,
one may select by "x-emb-sequence", and if one needs to be more specific,
one can do "x-emb-msa/phylip", and get all discrimination needed.

Then associating MIME-types to files would be left as freedom for a) GUI
designers, and b) users. Say, if I as a user prefer to call my FASTA formatted
sequences .aa  or .nt I can set my browser prefs to associate these with
the type I need and a click on a ".aa" or ".nt" file would open on my side
the appropriate program (e.g. a sequence editor).

It also has another plus: it grafts very well with all other standardisation
and objectization initiatives. We may come up with a fairly complete listing
and offer it to the standardisation body as a well wounded proposal (OMG, 
whatever) which would a) benefit all the community and b) prevent lock-outs 
by commercial companies wanting to lock-in users on their specific dialects
which might offer a standard designed to be incompatible with FOSS (say, like
MS patent on Office-XML formats).

Plus, has anybody else noticed the recent news about MS entering the field
of Bioinformatics with a huge project incorporating various EU universities?
How long do you think it will take before they start producing incompatible
formtas and standards to wipe out EMBOSS, GCG and others?

Don't take me wrong: I am certain some conventions would be really useful,
but ~20 years of dealing with users has shown me that many of them just
don't follow the conventions and you need to find a way for them.

				j
--
Jose R. Valverde
EMBnet/CNB
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/emboss-dev/attachments/20050214/580cefe8/attachment.sig>


More information about the emboss-dev mailing list