[EMBOSS] fasta single-line sequence format?

Peter Rice ricepeterm at yahoo.co.uk
Tue Aug 27 13:44:20 UTC 2013


Suggestions please for a format name to describe fasta format with the 
sequence always on a single line

(needed for output only - it will be valid as format 'fasta' for input).

Peter Rice
EMBOSS Team

On 27/08/2013 11:03, Niels Larsen wrote:
> Yes, i meant both input and output. It would not be default, so
> hopefully no programs should get a long-line surprise. The speed
> advantage is a single read for the whole sequence and not having
> to remove newlines. Indexing sub-sequences with locators
> becomes straightforward, the newlines don't get in the way. Most
> genome packages use it, i think, including mine. Thanks, yes i
> thought it must be quite easy to do ..
>
> Niels
>
> On Tue, 2013-08-27 at 10:41 +0100, Peter Rice wrote:
>> On 27/08/2013 09:40, Niels Larsen wrote:
>>> EMBOSS list,
>>>
>>> I could not find a fasta single-line sequence format, is it
>>> missing? having the sequence as a single line does not
>>> violate fasta format i think, and many programs use it
>>> because of speed and indexing convenience.
>>
>> You mean as an output format I assume? (it would be no problem for input).
>>
>> Easy to implement, but needs a name so you can so specify
>> -osformat fastasingle (for example)
>>
>> It can also be an issue for applications that fail to check for very
>> long input lines.
>>
>> I don't see any real benefit for indexing - you only need to point to
>> the start of the ID line for that. Maybe there are applications that map
>> the sequence string and want to have no extra characters.
>>
>> regards,
>>
>> Peter Rice
>> EMBOSS Team
>>
>




More information about the EMBOSS mailing list