[EMBOSS] fasta single-line sequence format?
ricepeterm at yahoo.co.uk
Tue Aug 27 15:59:08 UTC 2013
On 27/08/2013 16:18, Niels Larsen wrote:
> Neils: Re: 'Most genome packages use it': can you specify? Most genome
> packages I know allow the flexibility to use standard line-wrapped FASTA
> as well, so coding an indexing scheme for a non-standard FASTA alone
> seems… tricky. Unless you intend on allowing both, and 'unwrapped' is
> just for optimization.
> C Feilds: Yes, read both but write unwrapped (by default) so that steps
> a workflow can use the faster unwrapped format. Read routines that
> the file by looking at the first record and derive the format, are much
> than when reading/writing wrapped. And less work for the user/caller.
Ah, but can you trust the first record? If it is a relatively short
sequence it may be on one line, but later sequences may wrap. Depends on
the record limit.
As to the format name .... a name beginning 'fasta-' would be easiest to
document. For FASTQ we used fastq, fastq-sanger, fastq-solexa, and so on.
More information about the EMBOSS