[EMBOSS] fasta single-line sequence format?
Niels Larsen
niels at genomics.dk
Tue Aug 27 10:03:12 UTC 2013
Yes, i meant both input and output. It would not be default, so
hopefully no programs should get a long-line surprise. The speed
advantage is a single read for the whole sequence and not having
to remove newlines. Indexing sub-sequences with locators
becomes straightforward, the newlines don't get in the way. Most
genome packages use it, i think, including mine. Thanks, yes i
thought it must be quite easy to do ..
Niels
On Tue, 2013-08-27 at 10:41 +0100, Peter Rice wrote:
> On 27/08/2013 09:40, Niels Larsen wrote:
> > EMBOSS list,
> >
> > I could not find a fasta single-line sequence format, is it
> > missing? having the sequence as a single line does not
> > violate fasta format i think, and many programs use it
> > because of speed and indexing convenience.
>
> You mean as an output format I assume? (it would be no problem for input).
>
> Easy to implement, but needs a name so you can so specify
> -osformat fastasingle (for example)
>
> It can also be an issue for applications that fail to check for very
> long input lines.
>
> I don't see any real benefit for indexing - you only need to point to
> the start of the ID line for that. Maybe there are applications that map
> the sequence string and want to have no extra characters.
>
> regards,
>
> Peter Rice
> EMBOSS Team
>
More information about the EMBOSS
mailing list