[Open-bio-l] Naming for FASTQ example files

Thu Aug 6 08:17:07 UTC 2009

Hi all,

I am planning on compiling a set of set FASTQ files, for use by
Biopython, BioPerl, EMBOSS and anyone else that wants to test a
parser. Modest size contributions will be welcome (no big files
though).

I will have two types of files: valid ones, and invalid ones. The
basic idea is any parser should understand what we consider to be
valid files (we may need to provide matching FASTA and QUAL files or
something like this for verification), but also reject all the files
we consider to be invalid.

Regarding names, does "error_*.fastq" or "invalid_*.fastq" sound fine?

Any preference for meaningful names ("error_qual_short.fastq",
"error_qual_bad_char.fastq", ...) versus numbers ("error_001.fastq",
"error_002.fastq", ...). Either way I think a README file would need
to accompany the dataset stating what we think makes each example
invalid (e.g. quality string shorted than sequence, invalid character
in quality string, ...).

Peter