[Open-bio-l] White space in FASTQ files?

Peter biopython at maubp.freeserve.co.uk
Mon Aug 10 13:36:26 UTC 2009


On Mon, Aug 10, 2009 at 2:09 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
> Peter C. wrote:
>> Other than the special case of new lines which we have already covered
>> (allowed but line wrapping is discouraged), should FASTQ sequence
>> lines (and indeed the quality lines) ever be allowed to include white
>> space (e.g. spaces and tabs)? I've never seen this in a real FASTQ
>> file, and would like to suggest this be considered an error.
>>
>> Comments? Counter suggestions?
>
> I am happy adding a warning message in EMBOSS for this.
>

So you are thinking you'll try and cope with white space, and issue a
warning? This sounds dangerous to me. One of the properties of a
FASTQ file is the sequence string and the quality string should be the
same length (after removing the line wrapping). Allowing whitespace
in these strings makes that ambiguous. What if the sequence has
white space but not the quality? What if they both have white space
but in different positions?

Just calling any whitespace (other than the new line characters) an
error seems much safer. If there are any real files which do this, we
can revisit this.

Peter



More information about the Open-Bio-l mailing list