[Biopython] consensus for forward and reverse reads from a sequencing run

Willis, Jordan R jordan.r.willis at Vanderbilt.Edu
Tue Feb 25 02:21:40 UTC 2014


Hi Leo,

I know this is not what you asked and I’m not sure if BioPython has a module, but I would really recommend pandaseq (https://github.com/neufeld/pandaseq). Its written in C, so its much faster than python and really could not be any more simple to use. I typically use this for HiSeq and MiSeq runs and it just requires the forward and reverse paired end reads and spits out a consensus (with PHRED scores if you want).

Jordan

On Feb 24, 2014, at 7:59 PM, Leo Alexander Hansmann <leo2 at stanford.edu<mailto:leo2 at stanford.edu>> wrote:

Hi,
I'm getting two fasta files from an Illumina MiSeq run. One contains forward, the other reverse reads. The lines in both files are corresponding, meaning the first sequence in the forward read file should pair with the first sequence line in the reverse read file. Both sequences overlap in the middle in a varying amount of nucleotides. How can I get python or biopython to generate a file with the consensus sequences of each read. For example:
sequence in the forward read file: AATCGTCGGTTACTCTG
corresponding line in the reverse read file: CTCTGAGGGAGAGATC
I want: AATCGTCGGTTACTCTGAGGGAGAGATC
Thank you so much!
Leo





More information about the Biopython mailing list