[Bioperl-l] fastq parsing problem

Michael Muratet mmuratet at hudsonalpha.org
Fri May 8 19:29:38 UTC 2009


I've got a problem parsing fastq output from the maq aligner. The  
parser is throwing an exception for the following record:


I looked up the line in fastq.pm that does the parsing:

    116   my ($top,$sequence,$top2,$qualsequence) = $entry =~ /^
    117                                                         \@?(. 
    118                                                         ([^ 
    119                                                         \+?(. 
    120                                                         (.*)\n
    121                                                       /xs

I don't consider myself a regex-pert, but I would interpret the above  
as "put everything after one or zero @ characters on the first line in  
$top; then put anything that is not @ on the second line in $sequence;  
then everything after one or zero + characters on the third line in  
$top2; then everything on the fourth line in $qualsequence; and don't  
be greedy".

It seems like the fastq record above should parse with these rules. I  
note that the @ character is escaped in the regex and appears in  
several of the problem records, but not all. Has anyone come across  
this before? I don't see this exact problem in the list archives.



More information about the Bioperl-l mailing list