[EMBOSS] fasta single-line sequence format?
Fields, Christopher J
cjfields at illinois.edu
Tue Aug 27 14:08:17 UTC 2013
Is there a name for the FASTQ analog? Maybe 'unwrapped'? :)
Neils: Re: 'Most genome packages use it': can you specify? Most genome packages I know allow the flexibility to use standard line-wrapped FASTA as well, so coding an indexing scheme for a non-standard FASTA alone seems… tricky. Unless you intend on allowing both, and 'unwrapped' is just for optimization.
chris f.
On Aug 27, 2013, at 8:44 AM, Peter Rice <ricepeterm at yahoo.co.uk>
wrote:
> Suggestions please for a format name to describe fasta format with the sequence always on a single line
>
> (needed for output only - it will be valid as format 'fasta' for input).
>
> Peter Rice
> EMBOSS Team
>
> On 27/08/2013 11:03, Niels Larsen wrote:
>> Yes, i meant both input and output. It would not be default, so
>> hopefully no programs should get a long-line surprise. The speed
>> advantage is a single read for the whole sequence and not having
>> to remove newlines. Indexing sub-sequences with locators
>> becomes straightforward, the newlines don't get in the way. Most
>> genome packages use it, i think, including mine. Thanks, yes i
>> thought it must be quite easy to do ..
>>
>> Niels
>>
>> On Tue, 2013-08-27 at 10:41 +0100, Peter Rice wrote:
>>> On 27/08/2013 09:40, Niels Larsen wrote:
>>>> EMBOSS list,
>>>>
>>>> I could not find a fasta single-line sequence format, is it
>>>> missing? having the sequence as a single line does not
>>>> violate fasta format i think, and many programs use it
>>>> because of speed and indexing convenience.
>>>
>>> You mean as an output format I assume? (it would be no problem for input).
>>>
>>> Easy to implement, but needs a name so you can so specify
>>> -osformat fastasingle (for example)
>>>
>>> It can also be an issue for applications that fail to check for very
>>> long input lines.
>>>
>>> I don't see any real benefit for indexing - you only need to point to
>>> the start of the ID line for that. Maybe there are applications that map
>>> the sequence string and want to have no extra characters.
>>>
>>> regards,
>>>
>>> Peter Rice
>>> EMBOSS Team
>>>
>>
>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss
More information about the EMBOSS
mailing list