[EMBOSS] fasta single-line sequence format?

Niels Larsen niels at genomics.dk
Tue Aug 27 10:03:12 UTC 2013


Yes, i meant both input and output. It would not be default, so 
hopefully no programs should get a long-line surprise. The speed
advantage is a single read for the whole sequence and not having
to remove newlines. Indexing sub-sequences with locators 
becomes straightforward, the newlines don't get in the way. Most
genome packages use it, i think, including mine. Thanks, yes i 
thought it must be quite easy to do ..

Niels

On Tue, 2013-08-27 at 10:41 +0100, Peter Rice wrote:
> On 27/08/2013 09:40, Niels Larsen wrote:
> > EMBOSS list,
> >
> > I could not find a fasta single-line sequence format, is it
> > missing? having the sequence as a single line does not
> > violate fasta format i think, and many programs use it
> > because of speed and indexing convenience.
> 
> You mean as an output format I assume? (it would be no problem for input).
> 
> Easy to implement, but needs a name so you can so specify
> -osformat fastasingle (for example)
> 
> It can also be an issue for applications that fail to check for very 
> long input lines.
> 
> I don't see any real benefit for indexing - you only need to point to 
> the start of the ID line for that. Maybe there are applications that map 
> the sequence string and want to have no extra characters.
> 
> regards,
> 
> Peter Rice
> EMBOSS Team
> 




More information about the EMBOSS mailing list