[EMBOSS] Multiplatform filenames (was Re: Masking the : character?)

Peter Rice pmr at ebi.ac.uk
Mon Jun 20 09:16:35 UTC 2005


José R. Valverde wrote:

>>2005/6/17, Martin Sarachu <msarachu at biol.unlp.edu.ar>:
>>>is there any way to mask the ':' character so it is not interpreted as a
>>>delimiter for DB:sequence?

> The problem arises because the ':' is used for historic reasons as a
> carry-over from VMS where it had special meaning on pathnames. This 
> does not hold on UNIX where it is a legit character (actually ANY char
> but '/' and NULL is a legit character on UNIX). This is important as
> EMBOSS may be used on many locales, and you don't know in advance
> how a given symbol will be represented on them. Freedom comes at a 
> cost.

Strictly speaknig, the problem arises because ':' has become a standard for 
bioinformatics users - though, yes, VMS was the source of the special syntax. 
It was adopted by, among others, GCG and SRS. It also is used, of course, in 
URN and URL syntax.

However, in this case there is a partial solution. only alphanumneric 
characters are allowed in EMBOSS database names, and they must be more that 
one character in length (to avoid clashing with C: on Windows systems).

The problem posted was not in a database name. It was the filename:id syntax, 
where a ':' appeared in the filename full path.

For a ':' in a directory name (not in the filename) we could try to catch it 
by not allowing '/' in the ID. However, that can run into problems. For 
example, PFAM uses '/' in the identifier of a sequence derived from a longer 
entry.


> QUICK SOLUTION
> - ------------
> I think that for the user it is simpler to know that ':' has a special
> meaning and should be avoided.
> 
> For the cases where the colon is generated automatically, it may be better
> to provide a renaming script that changes the colon to something else.

That would be my recommendation too.

> UI 'PRO' APPROACH
> - ---------------
> For GUI writers it is probably better to "translate" any such filenames
> between the user and EMBOSS. Note the quotes around translate above: it
> is not immediate. Let me explain:
> 

> 	The trick is to create a special hidden directory on each user
> directory accessed: e.g. .myGUI-names. Then for every file make a
> suitably processed symlink on that subdirectory and call emboss through
> the symlink, sort of:

Looks like a good approach. The alternative would be to trap "bad" filenames 
and ask the user to correct them.

regards,

Peter






More information about the EMBOSS mailing list