[Bioperl-l] bug in Bio::SeqIO::fastq or Bio::Seq::SeqWithQuality?

Joseph Fass joseph.fass at gmail.com
Thu Mar 20 22:10:33 UTC 2008


I've written code to trim a certain number of bases (and, possibly,
associated qualities) from fasta (or fastq) format sequences, using:

$seq->seq($seq->subseq($a+1,$len-$b));
and, if it's fastq:
$seq->qual($seq->subqual($a+1,$len-$b));
where:
$len = $seq->length; # defined before changing $seq->seq
$a is the number of bases to trim off the beginning of the sequence
$b is the number of bases to trim off the end of the sequence

The code works for sequences, but for qualities I get a trimmed series of
quality characters that is the correct length and is at the correct
position, but has a number of characters (equal to $a) at the *end* of the
series changed to '!' ... i.e.:

@fake header 1
tcggacaatatatat
+
fjasfiojeq%!@%@

becomes:

@fake header 1 trimmed by 4 at beginning and 3 at end
acaatata
+fake header 1 trimmed by 4 at beginning and 3 at end
fioj!!!!

Since the relevant section of code is short, I'll post it:

my $in = Bio::SeqIO->new(-file => "<$opt_i", -format => $format);
my $out = Bio::SeqIO->new(-file=> ">$opt_o", -format => $format);
my $seq_length;
while (my $seq = $in->next_seq()) {
  $seq->desc($seq->desc()." trimmed by $opt_b at beginning and $opt_e at
end");
  $seq_length = $seq->length;
  $seq->seq($seq->subseq($opt_b+1,$seq_length-$opt_e));
  if ($format eq 'fastq') { # if fastq, trim qualities then write out in
fastq format
    $seq->qual($seq->subqual($opt_b+1,$seq_length-$opt_e));
    $out->write_fastq($seq); }
  else {$out->write_seq($seq);} # just write out sequence in fasta format
}

Why should the same process work for ->seq and ->subseq but not ->qual and
->subqual?  Please enlighten me ...


-- 
Joseph Fass
jnfass -at- gmail.com (personal) || joseph.fass -at- gmail.com(professional)
970.227.5928 (c) || 530.752.2698 (w)



More information about the Bioperl-l mailing list