[EMBOSS] fasta single-line sequence format?

Niels Larsen niels at genomics.dk
Tue Aug 27 16:40:07 UTC 2013


> Ah, but can you trust the first record? If it is a relatively short 
> sequence it may be on one line, but later sequences may wrap. Depends on 
> the record limit.
> 
> As to the format name .... a name beginning 'fasta-' would be easiest to 
> document. For FASTQ we used fastq, fastq-sanger, fastq-solexa, and so on.
> 

Indeed file sampling isn't water-tight, but i still think the
programmatic
equivalent of this:  head -n 2000 file | grep '^>' | wc --lines (where
the 
output number is 1000 if unwrapped) is much faster than being water
tight and bulletproof, given the very large files being handled. Besides
when did i last see a fasta file with 580 long sequence lines .. can't 
think of it. 

Niels L

> regards,
> 
> Peter Rice
> EMBOSS Team




More information about the EMBOSS mailing list