[EMBOSS] fasta single-line sequence format?

Fields, Christopher J cjfields at illinois.edu
Tue Aug 27 14:08:17 UTC 2013


Is there a name for the FASTQ analog?  Maybe 'unwrapped'? :)

Neils: Re: 'Most genome packages use it': can you specify?  Most genome packages I know allow the flexibility to use standard line-wrapped FASTA as well, so coding an indexing scheme for a non-standard FASTA alone seems… tricky.  Unless you intend on allowing both, and 'unwrapped' is just for optimization.

chris f.

On Aug 27, 2013, at 8:44 AM, Peter Rice <ricepeterm at yahoo.co.uk>
 wrote:

> Suggestions please for a format name to describe fasta format with the sequence always on a single line
> 
> (needed for output only - it will be valid as format 'fasta' for input).
> 
> Peter Rice
> EMBOSS Team
> 
> On 27/08/2013 11:03, Niels Larsen wrote:
>> Yes, i meant both input and output. It would not be default, so
>> hopefully no programs should get a long-line surprise. The speed
>> advantage is a single read for the whole sequence and not having
>> to remove newlines. Indexing sub-sequences with locators
>> becomes straightforward, the newlines don't get in the way. Most
>> genome packages use it, i think, including mine. Thanks, yes i
>> thought it must be quite easy to do ..
>> 
>> Niels
>> 
>> On Tue, 2013-08-27 at 10:41 +0100, Peter Rice wrote:
>>> On 27/08/2013 09:40, Niels Larsen wrote:
>>>> EMBOSS list,
>>>> 
>>>> I could not find a fasta single-line sequence format, is it
>>>> missing? having the sequence as a single line does not
>>>> violate fasta format i think, and many programs use it
>>>> because of speed and indexing convenience.
>>> 
>>> You mean as an output format I assume? (it would be no problem for input).
>>> 
>>> Easy to implement, but needs a name so you can so specify
>>> -osformat fastasingle (for example)
>>> 
>>> It can also be an issue for applications that fail to check for very
>>> long input lines.
>>> 
>>> I don't see any real benefit for indexing - you only need to point to
>>> the start of the ID line for that. Maybe there are applications that map
>>> the sequence string and want to have no extra characters.
>>> 
>>> regards,
>>> 
>>> Peter Rice
>>> EMBOSS Team
>>> 
>> 
> 
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss





More information about the EMBOSS mailing list