[Open-bio-l] Best practices for quality trimming?

Cook, Malcolm MEC at stowers.org
Thu Dec 3 15:00:23 UTC 2009


Surprisingly I've seen no mention on SeqAnswers of fastx-toolkit ("a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing")

http://hannonlab.cshl.edu/fastx_toolkit/commandline.html

We have had good experience with it for quality trimming, quality reporting, adaptor removal and duplicate "collapsing" of Illumina read.... and it has an integration with galaxy....

Any EMBOSS integration might seek to provide similar capabilities....

Malcolm Cook
Bioinformatics
Stowers Institute for Medical Research - Kansas City, Missouri
  

> -----Original Message-----
> From: open-bio-l-bounces at lists.open-bio.org 
> [mailto:open-bio-l-bounces at lists.open-bio.org] On Behalf Of Peter
> Sent: Thursday, December 03, 2009 8:29 AM
> To: Dan Bolser
> Cc: open-bio-l at lists.open-bio.org
> Subject: Re: [Open-bio-l] Best practices for quality trimming?
> 
> On Thu, Dec 3, 2009 at 1:12 PM, Dan Bolser 
> <dan.bolser at gmail.com> wrote:
> > What is there a Standard Operating Procedure (SOP) for quality 
> > trimming reads? i.e. which tool, what settings and for what purpose?
> >
> > It seems that, when using a window, the median quality of 
> the window 
> > should be used as the threshold for deciding where to 'end clip'
> > sequences.
> >
> > Is there a database of the assemblers, for example, that do 
> or don't 
> > take quality information into account when assembling?
> 
> Hi Dan,
> 
> It was nice to say hello again in Edinburgh this week:
> http://www.sbforum.org/earchive.php?e_id=79
> 
> As the group discussed, this is tricky - especially as it 
> will depend greatly on what you are going to do with the 
> reads next (e.g.
> assembly or mapping onto a reference) and which tools. For 
> velvet trimming seems to help (especially in terms of 
> reducing the memory demands).
> 
> If we can settle on a reasonable set of procedures, it would 
> be great to have implementations in EMBOSS (i.e. this could 
> be the "quaffle" tool Peter Rice has suggested) plus BioPerl, 
> Biopython etc. The later would be especially useful as base 
> points for people to modify the algorithm to try new ideas.
> 
> See also:
> http://lists.open-bio.org/pipermail/emboss/2009-December/003788.html
> 
> > I'm working on a software database for NGS tools here:
> >
> > http://seqwiki.com
> >
> > (It's still quite beta, and at some point it may move to 
> > http://bifx.org/wiki)
> 
> Currently it points at http://seqanswers.com/wiki/SEQanswers
> which is perhaps a good idea given the good reputation of seqanswers.
> 
> Peter C.
> _______________________________________________
> Open-Bio-l mailing list
> Open-Bio-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/open-bio-l
> 



More information about the Open-Bio-l mailing list