[Bioperl-l] Sequence qual values

Mon, 23 Sep 2002 09:48:34 -0700

The algorithm I've used is to compute the highest scoring region based on a match-reward (qualval for a base exceeds threshold) and mismatch-penalty (qualval of a base is lower or equal to threshold) scheme. Computing this is simple and a single pass, worked very well for me, gives you a score and flexible stringency, and I believe is the same algorithm phrap uses (at least that's what I found in the phred/phrap docs).

	-hilmar

> -----Original Message-----
> From: Charles Hauser [mailto:chauser@duke.edu]
> Sent: Monday, September 23, 2002 8:17 AM
> To: BioPerl-List
> Subject: [Bioperl-l] Sequence qual values
> 
> 
> Hi,
> 
> As part of an EST project, I would like to trim sequences 
> based on their
> qual values . 
> 
> 
> Using a window size that is 10% sequence length, I want to progress
> along the seq (incrementing the window by 1 nt w/each round) and
> calculate the mean qual for the window.  
> 
> With the mean qual data I want to trim the 5' and 3' sequences whose
> qual values are below a cutoff value (window qual >= 20).
> 
> So, in the case below, I would trim the seq to windows 3 <-> 6.
> 
> 			|........................|
> 
> qual:	5	12	30	36	59	21	8	6	
> 
> window:	1	2	3	4	5	6	
> 7	8	
> 
> 
> 
> Data Formats: (separate files)
> 
> Qual data:
> 
> >1112026H03.x1 PHD_FILE: 1112026H03.x1.phd.1
> 8 8 8 8 8 6 6 6 6 6 8 8 8 11 19 12 10 10 11 11 12 12
> 
> 
> Seq data format (fasta):
> 
> >1112026H03.x1  CHROMAT_FILE: 1112026H03.x1 PHD_FILE:
> 1112026H03.x1.phd.1 CHEM:
> GTCTGCTGAACTACACTACGGTCGAAGGGGAACGGGCCCCCACTCGACAT
> 
> Looking through the doc's I see there is a module for reading qual
> values (Bio::Seq::PrimaryQual;).
> 
> Before diving in, I thought I would check if anyone else has done
> something similar, and if so what their approach has been.
> 
> 
> regards,
> 
> Charles
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>