[Bioperl-l] Comparative genomics - regions of synteny and whole genome duplication

Lukasz Huminiecki lucash@ebi.ac.uk
Wed, 3 Oct 2001 12:23:39 +0100 (BST)

Just a few thoughts on the question of blocks/regions of synteny.

There is some great work done on that by Ken Wolfe et al. from Trinity
College Dublin on yeast, fish and more recently humans. I spoke to him
about it. What was most striking is that he defines blocks of synteny by
similarity on the level of genes rather than huge chunks of genomic
sequence. Non-coding sequence diverges too fast to be informative here. So
blocks of synteny are best defined by clusters of orthologue genes on

For example: s. cerevisiae   A1 B1 C1 D1
             s. pombe        A2 B2 C2 D2

where A1 and A2, B1 and B2 ... are orthologue pairs. 

Sometimes you can have positional rearrangements that change gene order a
bit, but this is still a region of synteny in evolutionary terms. 

For example: s. cerevisiae   A1 C1 B1 D1
             s. pombe        A2 B2 C2 D2

Sometimes one of the genes can be lost (for example when a given species
of yeast does not need a certain metabolic pathway anymore).

For example: s. cerevisiae   A1 C1 D1
             s. pombe        A2 B2 C2 D2

The major point is that you have to look at the order of genes AND NOT
genome similarity for regions of synteny. And this is in yeast where genes
are very densly located and there is actually very little non-coding
sequence. In man the amount of non-coding sequence and repeats is
just shocking. Fugu is a bit better - actually one of the major scientific
reasons to sequence fugu was because it has small genes and little
non-coding sequence. Perhaps if you lose all ALU's, LINE's and
SINE's you are bound to become as ugly as... fugu ;)

There is one pitfall here: genes quite frequently duplicate and you can
get sort-of self-hites which have nothing to do with synteny.

for example:              A1 A2
			  A1 A2

where A1 and A2 arouse by simple gene duplication. These have to be
prescreened for.

Another point that also concers the discussion on gene
families/clustering/orthologues/paralogues: the real biological criterium
for orthology and paralogy is not sequence similarity, it's
common pathway of molecular evolution! Using short sequence motifs (for
example enzyme active sites) is dangerous because evolutionary unrelated
genes may have converged to have similar function/active centres. You can
do it, but realise that you are clustering on function and NOT homology! 

Another biological phenomena to watch out for is whole genome
duplication. A gene that exists in one copy in s. albicans should have two
orthologues in s. cerevisiae (one round of whole geneme
duplication). Similarly two paralogues in fugu should have four
orthologues in man (it's estimated that vertebrae underwent two rounds of
whole genome duplication on the way to humans) unless some genes got
selected against and were deleted from the genome. Admittedly, looking for
the evidence of whole genome duplication is easier between species of
yeast that it's between humans and fugu where evolutionary distances are
much larger. 

I think some of these facts should be taken into account when designing
our software and databases: especially definition of regions of synteny by
clusters of neighbouring ortholgues!

See papers from Ken Wolfe for more (lots of stuff in Science and Nature).



Lukasz Huminiecki, D.Phil.                Tel: 01223 494451
EnsEMBL programmer                        Fax: 01223 494468
European Bioinformatics Institute         lucash@ebi.ac.uk
EMBL Outstation - Hinxton                 Home: 01799 528196
Wellcome Trust Genome Campus
Hinxton, Cambridge
CB10 1SD, UK