[EMBOSS] fasta single-line sequence format?
Niels Larsen
niels at genomics.dk
Tue Aug 27 16:40:07 UTC 2013
> Ah, but can you trust the first record? If it is a relatively short
> sequence it may be on one line, but later sequences may wrap. Depends on
> the record limit.
>
> As to the format name .... a name beginning 'fasta-' would be easiest to
> document. For FASTQ we used fastq, fastq-sanger, fastq-solexa, and so on.
>
Indeed file sampling isn't water-tight, but i still think the
programmatic
equivalent of this: head -n 2000 file | grep '^>' | wc --lines (where
the
output number is 1000 if unwrapped) is much faster than being water
tight and bulletproof, given the very large files being handled. Besides
when did i last see a fasta file with 580 long sequence lines .. can't
think of it.
Niels L
> regards,
>
> Peter Rice
> EMBOSS Team
More information about the EMBOSS
mailing list