[Bioperl-l] SeqIO: paired end reads

Peter Cock p.j.a.cock at googlemail.com
Sun Aug 7 09:40:52 UTC 2011


On Friday, August 5, 2011, Lee Katz <lskatz at gmail.com> wrote:
> Thank you.  I figured out through the Newbler manual that there is a
linker
> sequence to separate the paired end reads.  Then, the forum at
> http://seqanswers.com/forums/showthread.php?t=12940 showed me that the
> linker sequence is "GTTGGAACCGAAAGGGTTTGAATTCAAACCCTTTCGGTTCCAAC".

There is more than one Roche 454 linker sequence depending on the chemistry
used, one is the same as it's reversve complement, one isn't.

There is nothing in the SFF file format (nor the Roche specific XML manifest
last time I checked) that handles the paired end information explicitly.

> I think a useful addition to bioperl could be to have paired end reads.
>

Maybe, but to do this well you'd want to do flow space alignment of the
reads to the linker sequence to find the imperfectly called linker
sequences.

Personally I use ssf_extract which is a free open source command line tool
for this (calling an external aligned tool for paid end 454).

> This is outside of the domain of bioperl, but now I am left wondering how
I
> could specify the distance between reads in Newbler, if the linker
sequence
> is fixed.

How to do that depends on the aligned or assembly tool you are using.

Peter



More information about the Bioperl-l mailing list