[Bioperl-l] fastq parsing problem

Michael Muratet mmuratet at hudsonalpha.org
Tue May 12 14:31:21 UTC 2009


On May 9, 2009, at 5:55 AM, John Marshall wrote:

> Michael Muratet wrote:
>> I've got a problem parsing fastq output from the maq aligner. The
>> parser is throwing an exception for the following record:
>>
>> @HWI-EAS146:3:1:2:177#0/1
>> CTCCGCTNNCTTCTCAG[...]
>> +
>> @,AB=>-&&:5).;+*=[...]
>>
>> I looked up the line in fastq.pm that does the parsing:
>>
>>    116   my ($top,$sequence,$top2,$qualsequence) = [...]
>
> This is the fastq parser from 1.5.2 or thereabouts, which had a bug  
> (the
> $/ definition just above this code) that prevented it from parsing a
> record with a quality line starting with "@".  This was probably not
> recognised as a bug for a long time due to the enduring myth that  
> fastq
> quality lines always start with "!".
>
> The fastq next_seq() was rewritten for 1.6.0 and parses this  
> successfully.
> (Unfortunately the documentation at the top of fastq.pm was not  
> updated
> and still reflects the now-unused false belief about an initial "!"
> quality.)
>
> You may be able to just drop 1.6.0's Bio/SeqIO/fastq.pm in front of  
> your
> existing Bioperl installation, if you're a little crazy and don't  
> want to
> update the installation properly.  If you do that, or if you update,
> you'll find that the new parser emits the following pedantic warning  
> for
> your fastq sequences:
>

John

I did install 1.6.0 (which is very smooth, my compliments to the  
chefs) and it solved the problem except for the warning you note which  
Chris Fields fixed this morning.

Thanks for the help.

Mike

> MSG: Seq/Qual descriptions don't match; using sequence description
>
> In practice, lots of people (probably even most!) don't bother  
> putting the
> sequence id on the "+" line, as it is entirely pointless duplication,
> instead leaving the "+" line otherwise empty.  So I hope the  
> maintainers
> agree that this warning should be relaxed, such as in the attached  
> patch.
> Or even removed -- there was no equivalent warning in the previous  
> code.
>
> Cheers,
>
>    John
>
>
>
> -- 
> The Wellcome Trust Sanger Institute is operated by Genome Research
> Limited, a charity registered in England with number 1021457 and a
> company registered in England with number 2742969, whose registered
> office is 215 Euston Road, London, NW1 2BE.
> <qualdesc.diff>




More information about the Bioperl-l mailing list