[Bioperl-l] bug in Bio::SeqIO::fastq or Bio::Seq::SeqWithQuality?
Joseph Fass
joseph.fass at gmail.com
Thu Mar 20 22:10:33 UTC 2008
I've written code to trim a certain number of bases (and, possibly,
associated qualities) from fasta (or fastq) format sequences, using:
$seq->seq($seq->subseq($a+1,$len-$b));
and, if it's fastq:
$seq->qual($seq->subqual($a+1,$len-$b));
where:
$len = $seq->length; # defined before changing $seq->seq
$a is the number of bases to trim off the beginning of the sequence
$b is the number of bases to trim off the end of the sequence
The code works for sequences, but for qualities I get a trimmed series of
quality characters that is the correct length and is at the correct
position, but has a number of characters (equal to $a) at the *end* of the
series changed to '!' ... i.e.:
@fake header 1
tcggacaatatatat
+
fjasfiojeq%!@%@
becomes:
@fake header 1 trimmed by 4 at beginning and 3 at end
acaatata
+fake header 1 trimmed by 4 at beginning and 3 at end
fioj!!!!
Since the relevant section of code is short, I'll post it:
my $in = Bio::SeqIO->new(-file => "<$opt_i", -format => $format);
my $out = Bio::SeqIO->new(-file=> ">$opt_o", -format => $format);
my $seq_length;
while (my $seq = $in->next_seq()) {
$seq->desc($seq->desc()." trimmed by $opt_b at beginning and $opt_e at
end");
$seq_length = $seq->length;
$seq->seq($seq->subseq($opt_b+1,$seq_length-$opt_e));
if ($format eq 'fastq') { # if fastq, trim qualities then write out in
fastq format
$seq->qual($seq->subqual($opt_b+1,$seq_length-$opt_e));
$out->write_fastq($seq); }
else {$out->write_seq($seq);} # just write out sequence in fasta format
}
Why should the same process work for ->seq and ->subseq but not ->qual and
->subqual? Please enlighten me ...
--
Joseph Fass
jnfass -at- gmail.com (personal) || joseph.fass -at- gmail.com(professional)
970.227.5928 (c) || 530.752.2698 (w)
More information about the Bioperl-l
mailing list