[EMBOSS] fasta single-line sequence format?

Peter Rice ricepeterm at yahoo.co.uk
Tue Aug 27 15:59:08 UTC 2013


On 27/08/2013 16:18, Niels Larsen wrote:
> Neils: Re: 'Most genome packages use it': can you specify?  Most genome
> packages I know allow the flexibility to use standard line-wrapped FASTA
> as well, so coding an indexing scheme for a non-standard FASTA alone
> seems… tricky.  Unless you intend on allowing both, and 'unwrapped' is
> just for optimization.
>>
> C Feilds: Yes, read both but write unwrapped (by default) so that steps
> in
> a workflow can use the faster unwrapped format. Read routines that
> "taste"
> the file by looking at the first record and derive the format, are much
> faster
> than when reading/writing wrapped. And less work for the user/caller.

Ah, but can you trust the first record? If it is a relatively short 
sequence it may be on one line, but later sequences may wrap. Depends on 
the record limit.

As to the format name .... a name beginning 'fasta-' would be easiest to 
document. For FASTQ we used fastq, fastq-sanger, fastq-solexa, and so on.

regards,

Peter Rice
EMBOSS Team



More information about the EMBOSS mailing list