[Bioperl-l] Sequence qual values
Charles Hauser
chauser@duke.edu
23 Sep 2002 11:17:03 -0400
Hi,
As part of an EST project, I would like to trim sequences based on their
qual values .
Using a window size that is 10% sequence length, I want to progress
along the seq (incrementing the window by 1 nt w/each round) and
calculate the mean qual for the window.
With the mean qual data I want to trim the 5' and 3' sequences whose
qual values are below a cutoff value (window qual >= 20).
So, in the case below, I would trim the seq to windows 3 <-> 6.
|........................|
qual: 5 12 30 36 59 21 8 6
window: 1 2 3 4 5 6 7 8
Data Formats: (separate files)
Qual data:
>1112026H03.x1 PHD_FILE: 1112026H03.x1.phd.1
8 8 8 8 8 6 6 6 6 6 8 8 8 11 19 12 10 10 11 11 12 12
Seq data format (fasta):
>1112026H03.x1 CHROMAT_FILE: 1112026H03.x1 PHD_FILE:
1112026H03.x1.phd.1 CHEM:
GTCTGCTGAACTACACTACGGTCGAAGGGGAACGGGCCCCCACTCGACAT
Looking through the doc's I see there is a module for reading qual
values (Bio::Seq::PrimaryQual;).
Before diving in, I thought I would check if anyone else has done
something similar, and if so what their approach has been.
regards,
Charles