[EMBOSS] EMBOSS seqret : IntelliGenetics and new DOS lines

Peter Rice pmr at ebi.ac.uk
Tue Jul 21 12:12:00 UTC 2009


Daniel Barker wrote:
> Dear Peters et al.,
> 
> EMBOSS claims not to care about whether newlines are DOS or UNIX:
> 
> 'EMBOSS programs can read in both PC and Unix text file formats, so it
> is not necessary for you to use this utility all of the time' - noreturn
> documentation.
> 
> This would certainly be good. 'The newline problem' must be the single
> biggest computational waste of time I've experienced over the years!

Indeed. We get a little caught between accepting the extra carriage
returns and the need for efficient parsing as some users run seqret to
reformat large sequence files. Mac format (\015 only) is
horrible because the C library functions are looking for line feed
unless you're running on a Mac.

I will give our examples a run through after converting to PC format and
see if any others fall over.

> I've noticed a small amount of software, in the world in general, still
> uses the Mac OS 9 (and earlier) convention where newline is \015 only.
> E.g. this tab-delimited text saved from Excel 2004 for Mac:
> 
> $ od -bc Workbook1.txt
> 0000000   061 011 062 011 063 015 064 011 065 011 066
>            1  \t   2  \t   3  \r   4  \t   5  \t   6
> 0000013
> $
> 
> I expect this usage will decline, since it's in conflict with the
> convention of Mac OS X's own command-line tools (\012 only, like Linux).
> Probably the '\015 only' convention hasn't had much impact on
> bioinformatics anyway?

In my experience mac users didn't venture out into the real world. On the
other hand, unix users have often copied files from PCs. I used to do
the same myself, which was why I had to write noreturn in the first place.

regards,

Peter Rice



More information about the EMBOSS mailing list