[Biopython-dev] EMBOSS format name "fastq-sanger" in Biopython?

Peter biopython at maubp.freeserve.co.uk
Fri Jul 24 10:48:04 UTC 2009


On Mon, Jul 20, 2009 at 6:57 PM, Peter<biopython at maubp.freeserve.co.uk> wrote:
> Hi all at Biopython (and EMBOSS-dev CC'd),
>
> Now that EMBOSS 6.1.0 is out I've started checking it against Biopython.
> As I mentioned on the Biopython mailing list a week ago, in particular I'd
> like to make sure we agree on the various FASTQ variants. I'm waiting
> for EMBOSS to update the documentation on their website, but as I
> recall from talking to Peter Rice at BOSC/ISMB 2009 and a quick test
> this afternoon, they are using:
>
> fastq - FASTQ where the qualities are ignored (useful for input?)
> fastq-sanger - Standard Sanger style FASTQ using PHRED offset 33
> fastq-solexa - Early Solexa/Illumina FASTQ, Solexa scores offset 64
> fastq-illumina - Illumina 1.3+ FASTQ using PHRED offset 64
>
> I was expecting "fastq" to be an EMBOSS input only format given
> how I had understood this to be interpreted (ignore the qualities).
> ... I was however surprised that using "fastq" as an output format
> in EMBOSS seqret gives quality strings of double quote characters.

To be more precise, it looks like "fastq" as an output format in
EMBOSS is an alias for "fastq-sanger" (to be confirmed), see:
http://lists.open-bio.org/pipermail/emboss-dev/2009-July/000599.html

In any case, it would still make sense to include "fastq-sanger" as
an alias for the Sanger standard FASTQ files in Biopython's SeqIO,
especially if BioPerl is also going to use that name (to be confirmed):
http://lists.open-bio.org/pipermail/bioperl-l/2009-July/030688.html

Peter



More information about the Biopython-dev mailing list