[Biopython-dev] Cookbook entry, concatenating nexus files

Cymon Cox cy at cymon.org
Thu May 14 09:59:51 UTC 2009

2009/5/14 Peter <biopython at maubp.freeserve.co.uk>

> On Thu, May 14, 2009 at 5:53 AM, David Winter
> <winda002 at student.otago.ac.nz> wrote:
> >
> > I have been slowly adding some of the scripts I use most commonly to the
> > cookbook section of the wiki (
> http://biopython.org/wiki/Category:Cookbook).
> > Since I'm very much a  dilettante at this programming business as the
> > cookbook is meant as supplementary documentation for Biopython it's
> probably
> > a good idea for someone that knows what they are doing to look at these
> > things (Peter has been really helpful with this thus far, but is seems
> > unfair to saddle one man with so much bad programming :)
> >
> > I've just added a recipe that uses the nexus class to concatenate
> multiple
> > nexus files and provide some feedback if the taxa are not the same in
> each
> > one: http://biopython.org/wiki/Concatenate_nexus
> >
> > Any thoughts? If you think you can make it clearer/quicker/better then
> you
> > can edit it on the wiki or provide comments here of there.
> What exactly are you trying to achieve?  A big Nexus files with lots
> of alignments (and trees) in it?

The example David has given is very useful and a common procedure for
phylogeneticists. Single gene/proteins tend to be aligned in separate
alignment files and the concatenated into a so-called 'supermatrix'.

One thing I would question is the first line:

"It's a good idea, if possible, to make species-level phylogenetic
inferences bases on multiple genes because a) demographic processes can lead
gene-trees to diverge from species trees and b) journal editors now this."

Yes, it is a good idea to make inferences based upon the largest amount of
data, but if demographic process have led to some gene(s) that have diverged
from the species tree, then this is a reason not to combined them.
Phylogenetic inference assumes all data evolved on the same tree - typically
one would analyse gene partitions individually to look for incongruence
among partitions before combining the data.

> When I talked to Frank about Nexus files, he said they should only
> ever hold one alignment matrix,

Well, that was my understanding as well. But, it may be wrong. I just tried
it - p4 will read both matrices no problem, PAUP* (the de facto standard
here) will execute both matrices ok presumably leaving just the last as the
data in memory.

I'll look into this further...

Cheers C.

Cymon J. Cox

Centro de Ciencias do Mar
Faculdade de Ciencias do Mar e Ambiente (FCMA)
Universidade do Algarve
Campus de Gambelas
8005-139 Faro

Phone: +0351 289800909 ext 7909
Fax: +0351 289800051
Email: cy at cymon.org, cymon at ualg.pt, cymon.cox at gmail.com
HomePage : http://biology.duke.edu/bryology/cymon.html

More information about the Biopython-dev mailing list