[EMBOSS] Probabilistic versions of needle/water?

Mon Jul 6 14:25:47 UTC 2009

On Mon, Jul 6, 2009 at 1:32 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
>
>> I am aware of people using EMBOSS tools (I assume water) to identify
>> (known) adaptor sequences in raw Solexa/Illumina data. I considered
>> doing something similar myself when trying to remove primer sequences
>> from 454 data. Such a pipeline using the current EMBOSS water would be
>> doing this matching at a purely fixed nucleotide level (ignoring the
>> qualities), which isn't ideal. Upgrading to a probabilistic version of
>> water should be an improvement.
>
> Would be interesting.
>
> Where can I look up adaptor calling methods?

The particular example I had in mind was the thread with Giles Weaver
on the BioPerl mailing list, which I see you have just replied to:

http://lists.open-bio.org/pipermail/bioperl-l/2009-June/030398.html
http://lists.open-bio.org/pipermail/bioperl-l/2009-July/030404.html

I think I made a typo earlier (needle versus water). If you are
comparing a short but complete adaptor sequence to a read
(which you expect may contain the full adaptor) doing a global
alignment is more sensible that a local one. On re-reading,
Giles did actually say he was using needle:
http://lists.open-bio.org/pipermail/bioperl-l/2009-July/030411.html

Peter