[Bioperl-l] Sequence qual values
Andrew Walsh
walsh@cenix-bioscience.com
Tue, 24 Sep 2002 09:31:45 +0200
Hi there,
There is also a Bioperl module (Bio::Tools::Lucy.pm) that will give you
useful methods to deal with the output from "lucy". And if you make
some minor changes to the lucy source and recompile, you will get extra
data on the reason that sequences are being rejected by lucy. We were
using lucy at my last job and I agree with Brian that it's quite fast
and easy to use. The only thing (from what I remember...) is that it
takes a bit of work to get the vector file ready so that vector sequence
will be clipped. And I think you can only use one vector sequence (?),
so it's not as flexible as crossmatch in that respect.
Andrew
Brian Desany wrote:
>Hi Charles,
>
>For generating trim data, I use "lucy" from TIGR, as it's a lot faster than
>any manual perl solution I've come up with.
>
>It has a pretty flexible trimming algorithm, although I can't guarantee that
>it can do literally what you ask.
>
>http://www.tigr.org/software/#sf
>
>As for performing the actual trimming, it comes with an awk script to
>perform the trimming for you. From a bioperl standpoint I don't know if
>there's a good general purpose way to provide trim coordinate and a seq/qual
>pair and get a trimmed seq/qual pair out the other end.
>
>-Brian.
>
>
>
>
>>Hi,
>>
>>As part of an EST project, I would like to trim sequences
>>based on their
>>qual values .
>>
>>
>>Using a window size that is 10% sequence length, I want to progress
>>along the seq (incrementing the window by 1 nt w/each round) and
>>calculate the mean qual for the window.
>>
>>With the mean qual data I want to trim the 5' and 3' sequences whose
>>qual values are below a cutoff value (window qual >= 20).
>>
>>So, in the case below, I would trim the seq to windows 3 <-> 6.
>>
>> |........................|
>>
>>qual: 5 12 30 36 59 21 8 6
>>
>>window: 1 2 3 4 5 6
>>7 8
>>
>>
>>
>>Data Formats: (separate files)
>>
>>Qual data:
>>
>>
>>
>>>1112026H03.x1 PHD_FILE: 1112026H03.x1.phd.1
>>>
>>>
>>8 8 8 8 8 6 6 6 6 6 8 8 8 11 19 12 10 10 11 11 12 12
>>
>>
>>Seq data format (fasta):
>>
>>
>>
>>>1112026H03.x1 CHROMAT_FILE: 1112026H03.x1 PHD_FILE:
>>>
>>>
>>1112026H03.x1.phd.1 CHEM:
>>GTCTGCTGAACTACACTACGGTCGAAGGGGAACGGGCCCCCACTCGACAT
>>
>>Looking through the doc's I see there is a module for reading qual
>>values (Bio::Seq::PrimaryQual;).
>>
>>Before diving in, I thought I would check if anyone else has done
>>something similar, and if so what their approach has been.
>>
>>
>>regards,
>>
>>Charles
>>
>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l@bioperl.org
>>http://bioperl.org/mailman/listinfo/bioperl-l
>>
>>
>>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l@bioperl.org
>http://bioperl.org/mailman/listinfo/bioperl-l
>
>
>