[Biopython] still more questions about NGS sequence trimming
Kiss, Csaba
csaba.kiss at lanl.gov
Thu Oct 25 15:34:46 UTC 2012
I believe mothur does check the moving average quality of a sequence with a sliding window of 50 bp. If the quality falls below the given value then it tosses the sequence out. I don't think it does end trimming beside removing the small letters from the ends.
Of course, it can remove adapter and primer sequences but that's not based on quality values.
C
-----Original Message-----
From: Peter Cock [mailto:p.j.a.cock at googlemail.com]
Sent: Thursday, October 25, 2012 9:30 AM
To: Kiss, Csaba
Cc: biopython at lists.open-bio.org
Subject: Re: [Biopython] still more questions about NGS sequence trimming
On Thu, Oct 25, 2012 at 3:49 PM, Kiss, Csaba <csaba.kiss at lanl.gov> wrote:
> Thanks, Peter. I am writing my quality functions. Another question
> about trimming. As you mentioned, the quality of the ends tend to be
> lower than in the middle. Could that be fixed just by using "sff-trim"
> when I create my FASTQ file? If I don't do that I get sequences with small and capital letters.
> Are you suggesting further trimming than just "sff-trim".
In Bio.SeqIO, we use the file format names "sff" and "sff-trim" to mean the raw sequence data from the SFF file in full, or with the trimming values inside the SFF file applied. If you have used the Roche tools you'll see a similar option in their SFF extraction tool. This default trimming is decided by the Roche 454 instrument and does quite a good job at removing the adapters, barcodes and poor quality bits.
I assume you were using Mothur to do further trimming based on a more stringent sliding window of quality scores?
Peter
More information about the Biopython
mailing list