[Biopython] Concatenate to aligned sequences

Karolis Ramanauskas karolisr at gmail.com
Fri Feb 15 17:28:06 UTC 2013


Good day,

I have written a function that will take a list of alignments and will
concatenate them based on the sequence ids. The advantage here is that
the lists do not have to contain the same number of sequences, which
is helpful when you are trying to create one big alignment for
phylogenetic applications and some taxa are missing certain sequences.

concatenate function is here:
https://github.com/karolisr/krpy/blob/master/kralign.py other
functions can be ignored, it only depends on biopython to work.

Peace

On Thu, Feb 14, 2013 at 11:20 AM, Vincent Davis
<vincent at vincentdavis.net> wrote:
> I have 2 fasta files from a mucle alignment. Both have the same number of
> sequences from the same organism. If I what to concatenate the pairs of
> sequences what it the  best way to do this.
> Right now I am doing this:
>
> def concatenate(fa1, fa2):
>     fa1open = open(fa1, "rU")
>     fa2open = open(fa1, "rU")
>     fa1dict =  SeqIO.to_dict(SeqIO.parse(fa1open, "fasta"))
>     fa2dict =  SeqIO.to_dict(SeqIO.parse(fa2open, "fasta"))
>     fa1open.close()
>     fa2open.close()
>     # check that both files have the same sequnce id's
>     if set(fa1dict.keys()) != set(fa2dict.keys()):
>         print(fa1dict.keys(), fa2dict.keys())
>         print('The fasta files do not have the same sequences')
>     bothdict = {}
>     bothlist = []
>     count = 1
>     for key in fa2dict.keys():
>         bothdict[key] = fa2dict[key]
>         bothdict[key].seq = fa2dict[key].seq + fa1dict[key].seq
>         bothlist.append(bothdict[key])
>     return bothdict, bothlist
>
> Vincent Davis
> 720-301-3003
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython



More information about the Biopython mailing list