[Bioperl-l] SeqIO: paired end reads

Chris Fields cjfields at illinois.edu
Sun Aug 7 15:51:19 UTC 2011


On Aug 7, 2011, at 4:40 AM, Peter Cock wrote:

> On Friday, August 5, 2011, Lee Katz <lskatz at gmail.com> wrote:
>> Thank you.  I figured out through the Newbler manual that there is a
> linker
>> sequence to separate the paired end reads.  Then, the forum at
>> http://seqanswers.com/forums/showthread.php?t=12940 showed me that the
>> linker sequence is "GTTGGAACCGAAAGGGTTTGAATTCAAACCCTTTCGGTTCCAAC".
> 
> There is more than one Roche 454 linker sequence depending on the chemistry
> used, one is the same as it's reversve complement, one isn't.
> 
> There is nothing in the SFF file format (nor the Roche specific XML manifest
> last time I checked) that handles the paired end information explicitly.

Yep, it's all implied AFAIK.

>> I think a useful addition to bioperl could be to have paired end reads.
>> 
> 
> Maybe, but to do this well you'd want to do flow space alignment of the
> reads to the linker sequence to find the imperfectly called linker
> sequences.
> 
> Personally I use ssf_extract which is a free open source command line tool
> for this (calling an external aligned tool for paid end 454).

I think it could be done, but I would implement something like this as a wrapper around faster tools (like sff_extract or similar).  Implementing the functionality in pure (bio)perl/(bio)python doesn't make much sense if there are newer/faster tools out there.

>> This is outside of the domain of bioperl, but now I am left wondering how
> I
>> could specify the distance between reads in Newbler, if the linker
> sequence
>> is fixed.
> 
> How to do that depends on the aligned or assembly tool you are using.
> 
> Peter

Yep.  I don't think there is a defined way to specify that in any format that I know of.

chris



More information about the Bioperl-l mailing list