[Biopython-dev] [Bug 2848] SeqIO fastq routines reject valid quality socres

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Thu Jun 4 19:22:55 UTC 2009


http://bugzilla.open-bio.org/show_bug.cgi?id=2848





------- Comment #5 from pmmagic at gmail.com  2009-06-04 15:22 EST -------
(In reply to comment #4)
> (In reply to comment #3)
> > P.S. The reason I originally used 0 to 90 was this line in the MAQ page and
> > the fq_all2std.pl text:
> > 
> > "In the quality string, if you can see a character with its ASCII code higher
> > than 90, probably your file is in the Solexa/Illumina format."
> 
> Ignore that - I was thinking PHRED scores but they are talking ASCII codes. I
> guess they consider PHRED scores of 57+ to be rare.
> 
> (In reply to comment #0)
> > The fastq routines in SeqIO.QualityIO reject what I believe are valid quality
> > scores.
> > 
> > According to the MAQ website (http://maq.sourceforge.net/fastq.shtml; I don't
> > know if this is definitive), valid quality values in Sanger style FASTQ format
> > are:
> > 
> > <qual>  :=      [!-~\n]+
> > 
> > This corresponds to Phred quality scores in the range 0-93.
> 
> Yes, it does:
> 
> ord("!")-33 = 0
> ord("~")-33 = 93
> 
> The maq website isn't definitive, but it was written by people at Sanger where
> the FASTQ format was invented, and to my knowledge is the closest thing to an
> official description of the format.
> 
> (In reply to comment #2)
> > Rereading that MAQ page, you are probably right about allowing 0-93 rather
> > than 0-90 for PHRED scores.
> 
> Fixed in CVS.
> 
> > Could you pull out a few valid FASTQ read showing this problem as a short
> > example file we can use for a unit test? (and attach it to this bug)...
> 
> On re-reading your bug report, I'm not sure if you actually have a file where
> this is a problem, of it you just noticed the minor discrepancy in the
> threshold?
> 

HI Peter,

The problem arises in parsing the fastq formatted consensus mappings
produced by MAQ, so these are "mapping qualities" rather than read
qualities directly.

These mapping qualities, however, are in the same scale as Phred
quality scores (ttp://maq.sourceforge.net/qual.shtml ) and MAQ's fastq
output is Sanger style.

Since the mapping scores are, in part, a function read depth it's not
too unusual to get very high quality scores in the MAQ output.

Here's a simple snippet that is valid fastq:

@ref|NC_001133|
nnnnnnnnnnnnnnnacacccacacaccacaccacacaccACACCACACCCACACACACA
CATCCTAACACTACCCTAACACAGCCctaatcyaacCCTGACCAACCTGTCTCTCAACTT
+
!!!!!!!!!!!!!!!@EHHHHHHKKJKKKKNNNBN:NNNNQQQQQABGA?LTTWWWZZZI
HEFBZLZ]]]]]]]]]ZZZZZT at TTQQQT4A]1?cfiloxL{xuuux{]~~~~~Ake~`~


Thanks,
Paul M


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list