<div dir="ltr">Thank you both. I'll get to work on both of those suggestions and let you know what I figure out.<div><br></div><div> Damian</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Sep 17, 2015 at 4:01 AM, Ivan Gregoretti <span dir="ltr"><<a href="mailto:ivangreg@gmail.com" target="_blank">ivangreg@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">In case it is needed, merging paired reads in FASTQ format can be done<br>
with a tool called FLASH, "Fast Length Adjustment of SHort reads".<br>
<br>
I use it routinely for merging pairs of 2x300 bp from Illumina's technology.<br>
<br>
I hope this helps.<br>
<br>
Ivan<br>
<br>
<br>
<br>
Ivan Gregoretti, PhD<br>
Bioinformatics<br>
<div class="HOEnZb"><div class="h5"><br>
<br>
<br>
On Thu, Sep 17, 2015 at 7:08 AM, Peter Cock <<a href="mailto:p.j.a.cock@googlemail.com">p.j.a.cock@googlemail.com</a>> wrote:<br>
> Hi Damian,<br>
><br>
> This sounds very like read merging down with paired end Illumina FASTQ<br>
> files, although here you are presumably using "Sanger" capillary<br>
> sequencing? If so the ABI files can be turned into FASTQ files with<br>
> quality scores rather than just FASTA files (e.g. with Biopython's<br>
> SeqIO). You would probably have to rename your reads, e.g.<br>
> "identifier/1 (space) optional text" and "identifier/2 (space)<br>
> optional text" but I'm not sure how well pair-merging tools would cope<br>
> with these longer reads.<br>
><br>
> Peter<br>
><br>
><br>
><br>
> Peter<br>
><br>
><br>
> On Wed, Sep 16, 2015 at 10:25 PM, Damian Menning <<a href="mailto:dmenning@mail.usf.edu">dmenning@mail.usf.edu</a>> wrote:<br>
>> Hello All,<br>
>><br>
>><br>
>> I have a fasta dataset in a single file with multiple paired end reads in<br>
>> paired sets of forward and reverse sequences (the reverse sequence is in the<br>
>> correct orientation). I am pretty sure this is the real world example<br>
>> requested in 6.1.3 of the Biopython Cookbook J. Within this dataset all of<br>
>> the information is the same i.e. ID:, Name:, Number of features:. The only<br>
>> exceptions are the descriptions and sequences. Ex.<br>
>><br>
>><br>
>>>UAR Kaktovik 11-004 F L15774b(M13F)<br>
>><br>
>> GTAGTATAGCAATTACCTTGGTCTTGTAAGCCAAAAACGGAGAATACCTACTCTCCCTAA<br>
>><br>
>> GACTCAAGGAAGAAGCAACAGCTCCACTACCAGCACCCAAAGCTAATGTTCTATTTAAAC<br>
>><br>
>> TATTCCCTGGTACATACTACTATTTTACCCCATGTCCTATTCATTTCATATATACCATCT<br>
>><br>
>> TATGTGCTGTGCCATCGCAGTATGTCCTCGAATACCTTTCCCCCCCTATGTATATCGTGC<br>
>><br>
>> ATTAATGGTGTGCCCCATGCATATAAGCATGTACATATTACGCTTGGTCTTACATAAGGA<br>
>><br>
>> CTTACGTTCCGAAAGCTTATTTCAGGTGTATGGTCTGTGAGCATGTATTTCACTTAGTCC<br>
>><br>
>> GAGAGCTTAATCACCGGGCCTCGAGAAACCAGCAACCCTTGCGAGTACGTGTACCTCTTC<br>
>><br>
>> TCGCTCCGGGCCCATGGGGTGTGGGGGTTTCTATGTTGAAACTATACCTGGCATCTG<br>
>><br>
>><br>
>><br>
>>>UAR Kaktovik 11-004 R CSBCH(M13R)<br>
>><br>
>> TCCCTTCATTATTATCGGACAACTAGCCTCCATTCTCTACTTTACAATCCTCCTAGTACT<br>
>><br>
>> TATACCTATCGCTGGAATTATTGAAAACAGCCTCTTAAAGTGGAGAGTCTTTGTAGTATA<br>
>><br>
>> GCAATTACCTTGGTCTTGTAAGCCAAAAACGGAGAATACCTACTCTCCCTAAGACTCAAG<br>
>><br>
>> GAAGAAGCAACAGCTCCACTACCAGCACCCAAAGCTAATGTTCTATTTAAACTATTCCCT<br>
>><br>
>> GGTACATACTACTATTTTACCCCATGTCCTATTCATTTCATATATACCATCTTATGTGCT<br>
>><br>
>> GTGCCATCGCAGTATGTCCTCGAATACCTTTCCCCCCCTATGTATATCGTGCATTAATGG<br>
>><br>
>> TGTGCCCCATGCATATAAGCATGTACATATTACGCTTGGTCTTACATAAGGACTTACGTT<br>
>><br>
>> CCGAAAGCTTATTTCAGGTGTATGGTCTGTGAGCATGTATTTCACTTAGTCCGAGAGCTT<br>
>><br>
>> AATCACCGGGCCTCGAGAAACCAGCAACCCTTGCGAGTACGTGTACCTCTTCTCGCTCCG<br>
>><br>
>> GGCCCATGGGGTGTGGGGGTTTCTATGTTGAAACTATACCTG<br>
>><br>
>><br>
>><br>
>> My end goal is to align the paired ends of the sequences that have the same<br>
>> description and save the aligned sequence to another file for further<br>
>> analyses. I have a few problems:<br>
>><br>
>><br>
>><br>
>> 1) The descriptions of each sequence are not identical so I need to delete<br>
>> all but the first three parts and include the associated sequence. I.e.<br>
>> remove F L15774b(M13F) and R CSBCH(M13R) above. The script below is what I<br>
>> have to make a new dictionary in this format. Is this the best way to<br>
>> proceed in order to align the sequences in the next step?<br>
>><br>
>><br>
>><br>
>> handle = open("pairedend2.txt", 'r')<br>
>><br>
>><br>
>> output_handle = open("AlignDict.txt", "a")<br>
>><br>
>><br>
>> desc2=dict()<br>
>><br>
>> from Bio import SeqIO<br>
>><br>
>> for seq_record in SeqIO.parse(handle, "fasta"):<br>
>><br>
>> parts = seq_record.description.split(" ")<br>
>><br>
>> des = [str(parts[0] + ' ' + parts[1] + ' ' + parts[2] + ':' +<br>
>> seq_record.seq)]<br>
>><br>
>> desc2=(dict(v.split(':') for v in des))<br>
>><br>
>> print ('\n' + str(desc2))<br>
>><br>
>> output_handle.write(str(desc2) + '\n')<br>
>><br>
>><br>
>><br>
>> output_handle.close()<br>
>><br>
>><br>
>><br>
>> 2) My second issue is figuring out how to do the alignment. I thought I<br>
>> would do a local alignment using something like needle (or is there a better<br>
>> way?) but the script examples I have seen so far use two files with a single<br>
>> sequence in each and I have one file with multiple sequences. There is no<br>
>> easy way to separate these out into individual sequences into different<br>
>> files as the data sets are quite large.<br>
>><br>
>><br>
>><br>
>> Any help/ideas would be greatly appreciated.<br>
>><br>
>><br>
>><br>
>> Thank you<br>
>><br>
>><br>
>> Damian<br>
>><br>
>><br>
>> --<br>
>> Damian Menning, Ph.D.<br>
>><br>
>> _______________________________________________<br>
>> Biopython mailing list - <a href="mailto:Biopython@mailman.open-bio.org">Biopython@mailman.open-bio.org</a><br>
>> <a href="http://mailman.open-bio.org/mailman/listinfo/biopython" rel="noreferrer" target="_blank">http://mailman.open-bio.org/mailman/listinfo/biopython</a><br>
> _______________________________________________<br>
> Biopython mailing list - <a href="mailto:Biopython@mailman.open-bio.org">Biopython@mailman.open-bio.org</a><br>
> <a href="http://mailman.open-bio.org/mailman/listinfo/biopython" rel="noreferrer" target="_blank">http://mailman.open-bio.org/mailman/listinfo/biopython</a><br>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div><font face="arial, helvetica, sans-serif" size="2">Damian Menning, Ph.D.</font></div></div></div></div></div></div></div></div></div>
</div>