[Open-bio-l] Naming for FASTQ example files

Peter biopython at maubp.freeserve.co.uk
Thu Aug 27 15:26:23 UTC 2009


On Sat, Aug 8, 2009 at 1:53 PM, Peter<biopython at maubp.freeserve.co.uk> wrote:
> On Thu, Aug 6, 2009 at 9:17 AM, Peter<biopython at maubp.freeserve.co.uk> wrote:
>> Hi all,
>>
>> I am planning on compiling a set of set FASTQ files, for use by
>> Biopython, BioPerl, EMBOSS and anyone else that wants to test a
>> parser. Modest size contributions will be welcome (no big files
>> though).
>>
>> I will have two types of files: valid ones, and invalid ones. The
>> basic idea is any parser should understand what we consider to be
>> valid files (we may need to provide matching FASTA and QUAL files or
>> something like this for verification), but also reject all the files
>> we consider to be invalid.
>> ...
>
> I've gone for "error_*.fastq" and have tried to use meaningful names
> rather than numbers. Currently these files are only in the Biopython
> repository (under biopython/Tests/Quality), but could be added to the
> (currently) unused Biodata repository - although that is still on CVS:
>
> http://lists.open-bio.org/pipermail/open-bio-l/2009-January/000511.html
>
> As these examples are all small and we don't expect to change them,
> I could also just email them (off the mailing list) to EMBOSS/BioPerl
> people directly on request.

Chris Fields has already included the original "error_*.fastq" files in
BioPerl SVN as test cases. Peter Rice has pointed out a minor error
in "error_short_qual.fastq" which I have now corrected (it had a
short sequence, not a short quality line), and after discussion we
have come up with a few more truncation examples:

error_trunc_in_title.fastq
error_trunc_in_seq.fastq
error_trunc_in_plus.fastq
error_trunc_in_qual.fastq

Again, you can grab these five files (four new, one updated) from
Biopython CVS/git, and I will also be emailing Chris & Peter R
directly.

Peter C.



More information about the Open-Bio-l mailing list