[Bioperl-l] Sequence qual values
Hilmar Lapp
hlapp@gnf.org
Mon, 23 Sep 2002 09:48:34 -0700
The algorithm I've used is to compute the highest scoring region based on a match-reward (qualval for a base exceeds threshold) and mismatch-penalty (qualval of a base is lower or equal to threshold) scheme. Computing this is simple and a single pass, worked very well for me, gives you a score and flexible stringency, and I believe is the same algorithm phrap uses (at least that's what I found in the phred/phrap docs).
-hilmar
> -----Original Message-----
> From: Charles Hauser [mailto:chauser@duke.edu]
> Sent: Monday, September 23, 2002 8:17 AM
> To: BioPerl-List
> Subject: [Bioperl-l] Sequence qual values
>
>
> Hi,
>
> As part of an EST project, I would like to trim sequences
> based on their
> qual values .
>
>
> Using a window size that is 10% sequence length, I want to progress
> along the seq (incrementing the window by 1 nt w/each round) and
> calculate the mean qual for the window.
>
> With the mean qual data I want to trim the 5' and 3' sequences whose
> qual values are below a cutoff value (window qual >= 20).
>
> So, in the case below, I would trim the seq to windows 3 <-> 6.
>
> |........................|
>
> qual: 5 12 30 36 59 21 8 6
>
> window: 1 2 3 4 5 6
> 7 8
>
>
>
> Data Formats: (separate files)
>
> Qual data:
>
> >1112026H03.x1 PHD_FILE: 1112026H03.x1.phd.1
> 8 8 8 8 8 6 6 6 6 6 8 8 8 11 19 12 10 10 11 11 12 12
>
>
> Seq data format (fasta):
>
> >1112026H03.x1 CHROMAT_FILE: 1112026H03.x1 PHD_FILE:
> 1112026H03.x1.phd.1 CHEM:
> GTCTGCTGAACTACACTACGGTCGAAGGGGAACGGGCCCCCACTCGACAT
>
> Looking through the doc's I see there is a module for reading qual
> values (Bio::Seq::PrimaryQual;).
>
> Before diving in, I thought I would check if anyone else has done
> something similar, and if so what their approach has been.
>
>
> regards,
>
> Charles
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>