[EMBOSS] FASTQ records with no sequence?

Peter biopython at maubp.freeserve.co.uk
Thu Jul 30 15:35:25 UTC 2009


Hi all,

On the continuing topic of the nebulous FASTQ format, are there
any strong views as to weather a FASTQ files could hold records
without a sequence (and therefore no quality scores)? This could
make sense as output from an (aggressive) quality filter.

This was a discussion I meant to start on the OBF list, not the
EMBOSS list - so here is the start of the thread:
http://lists.open-bio.org/pipermail/emboss/2009-July/003707.html

Basically in some contexts an empty FASTQ record makes sense,
so perhaps we should include examples of this for our test suite.
However, there is more than one reasonable way to represent
such a record (either omitting the sequence and quality lines, or
including blank sequence and quality lines).

On Thu, Jul 30, 2009 at 4:09 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
>
> Peter C. wrote:
>
>> As we are recommending no line wrapping on output this means
>> typical FASTQ records would be four lines - so doing the same
>> makes sense here too.
>
> I vote for 4 lines on output.

If we want to allow zero length sequences, then yes, I would also
vote for the 4 line output (i.e. blank lines for the sequence and
the quality string).

> It should be possible to allow zero lines on input depending on
> where the '+' check is.

Yes, I'm pretty sure a parser could cope with any of the zero length
sequence FASTQ examples I gave.

Peter



More information about the EMBOSS mailing list