[Bioperl-l] SeqIO & multi-line fastq

Joel Martin j_martin at lbl.gov
Fri Nov 7 22:45:34 UTC 2008


Hello,
   multiline fastq seems broken by design, @ is a quality score
and also the id delimiter.  the script accompanying maq for converting
fastq to fasta can't parse the multiline fastq output by maq, so I'd
say it's maq that's wrong.
   I did this to parse them, but wasn't sure enough about /^\+/ to 
suggest it for bioperl.

while (<$fh>) {
  if (/^@(\S+)/) {         # read name
    print ">$1\n";
 
    my $lines = 0;
 
    while ( <$fh> ) {      # read sequence
      if ( ! (/^\+/) ) {   # stop at '+' line
        print;
        $lines++;
      }
      else {
        last;
      }
    }
    while ( $lines-- ) {  # skip quals
      <$fh>;
    }
  }
}

Joel

On Fri, Nov 07, 2008 at 03:59:07PM -0500, Tristan Lefebure wrote:
> Hi there,
> 
> I'm parsing with SeqIO a FastQ file made by MAQ. SeqIO complains because
> this is a multiline fastq file. By looking at the Bio::SeqIO::fastq,
> it's pretty obvious that it can't handle multilines. Who is wrong? MAQ,
> SeqIO, or am I missing something?
> 
> Some more details below:
> 
> ###
> [tristan at trudy maq_easyrun] seq2seq.pl cns.fq fastq cns.fna fasta
> 
> ------------- EXCEPTION -------------
> MSG: AACTATTTATCAAATTTAAAATTCAACGAAAAACAAAGCAAAGCAGATCTTTTAGTTTTT
> doesn't match fastq descriptor line type
> STACK
> Bio::SeqIO::fastq::next_seq /usr/local/share/perl/5.10.0/Bio/SeqIO/fastq.pm:113
> STACK toplevel /home/tristan/bin/seq2seq.pl:25
> -------------------------------------
> ###
> 
> The fastq file looks like that:
> -----------
> @nctc11168
> atgAATCCAAGCCAAATACTTGAAAATTTAAAAAAAGAATTAAGTGAAAACGAATACGAA
> AACTATTTATCAAATTTAAAATTCAACGAAAAACAAAGCAAAGCAGATCTTTTAGTTTTT
> AATGCTCCAAATGAACTCATGGCTAAATTCATACAAACAAAATACGGCAAAAAAATCGCG
> CATTTTTATGAAGTGCAAAGCGGAAATAAAGCCATCATAAATATACAAGCACAAAGTGCT
> AAACAAAGCAACAAAAGCACAAAAATCGACATAGCTCATATAAAAGCACAAAGCACGATT
> TTAAATC[...]
> [some 20000 lines later]
> AACCTTTTTTTATAAAATTTAAGATAAAATTTATACATTATGCAAAATTTAAAGAGAgat
> n
> +
> EQWWZ`cffilmu~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ~~~~~[...]
> ---------
> 
> Thanks!
> 
> -Tristan
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



More information about the Bioperl-l mailing list