[BioRuby] Drawing a phylogeny ASCII tree

Pjotr Prins pjotr.public14 at thebird.nl
Sun Mar 18 10:09:05 UTC 2012


On Fri, Mar 16, 2012 at 02:51:53PM +0100, Pjotr Prins wrote:
> Anyone interesting in a little coding challenge? I wrote a feature
> for drawing a phylogeny ASCII tree:
> 
> (snip)
>
> Then draw MSA with the short tree
>   """
>   +----------------- seq7  ----------PTIIFSGCSKACSGK-----VCGIFHAVRSFM
>   |           ,----- seq1  ----SNSFSRPTIIFSGCSTACSGK--SELVCGFRSFMLSDV
>   |        ,--|  ,-- seq2  SSIISNSFSRPTIIFSGCSTACSGK--SEQVCGFR---LSDV
>   |     ,--+  `--+-- seq3  SSIISNSFSRPTIIFSGCSTACSGKLTSEQVCGFR---LSDV
>   |     |  |--+----- seq5  ----------PTIIFSGCSKACSGKGLSELVCGFRSFMLSDV
>   |  ,--|     `----- seq8  --------PTIIFSGCSKACSGK--SELVCGFRSFMLSAV
>   |--|  `----------- seq4  ----PKLFSRPTIIFSGCSTACSGK--SEPVCGFRSFMLSDV
>      `-------------- seq6  ----------PTIIFSGCSKACSGK-----FRSFRSFMLSAV
>      1  2  3  4  5
>   """

I have been thinking about this 'ASCII cladogram' algorithm, as it
would be very useful for testing code. Unfortunately I have found no
example that really appeals to me in the other Bio* projects (we are
talking text here, not graphics; the graphics ones are actually
nicer). 

The examples I have are

* BioPerl tabtree
* Ruby challenge http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/149701
* BioPythons http://biopython.org/DIST/docs/api/Bio.Phylo._utils-pysrc.html#draw_ascii

I have mailed T. Mike Keesey, of
http://www.palaeos.org/Ascii_phylogenetic_tree, as he has some interesting
generator, but he has not responded, yet.

So, assuming I am on my own, I was thinking that the tree needs to be
drawn in a matrix of characters. The pseudo code I come up with is
based on 'matrix expansion' - every time a sequence gets added, we
add a line. If a tree goes left (up), e.g. the last node before seq2, 
we expand the matrix by injecting a new line and extending the
verticals. We extend the horizontals in the final step.

Assuming we know the tree with leafs and nodes:

step 1, start with the first leaf seq7 (R=root, L=Leaf, o=node, N=new node)

  R--L seq7

step 2, add seq1 (4 nodes)

  R--L seq7
  `--o--o--o--o--L seq1

step 3, add seq2 (5 nodes)

  R--L seq7
  |           ,--L seq1
  `--o--o--o--N--o--L seq2

Now what you see is that seq1 is on the branch of new seq2. So after
adding seq2, we wipe seq1 to the left of N and copy the verticals on
the left side of the new node (N). 

step 4, add seq 3


  R--L seq7
  |           ,--L seq1
  |           |  ,--L seq2
  `--o--o--o--|--N--L seq3

Same principle. After adding seq3, we find seq2 is connected and we
split seq2 - left side extend verticals, right side connect

step 5, add seq5 (4 nodes)

  R--L seq7
  |           ,--L seq1
  |           |  ,--L seq2
  |        ,--|--N--L seq3
  `--o--o--N--o--L seq5

Again, seq5 connects at the 3rd node. So we split the above

step 6, seq 8 (4 nodes)

  R--L seq7
  |           ,--L seq1
  |           |  ,--L seq2
  |        ,--|--N--L seq3
  |        |  ,--L seq5
  `--o--o--|--N--L seq8

Split on node 4. Expand verticals to the left of N.
Here the algorithm changed the original drawing a little.

step 7, seq 4 (2 nodes)

  R--L seq7
  |           ,--L seq1
  |           |  ,--L seq2
  |        ,--|--N--L seq3
  |        |  ,--L seq5
  |     ,--|--N--L seq8
  `--o--N--L seq4

step 8, seq6 (1 node)


  R--L seq7
  |           ,--L seq1
  |           |  ,--L seq2
  |        ,--|--N--L seq3
  |        |  ,--L seq5
  |     ,--|--N--L seq8
  |  ,--N--L seq4
  `--N--L seq6

step 9, we can expand horizontally


  R----------------- seq7
  |           ,----- seq1
  |           |  ,-- seq2
  |        ,--+--+-- seq3
  |        |  ,----- seq5
  |     ,--+--+----- seq8
  |  ,--+----------- seq4
  `--+-------------- seq6
     1  2  3  4  5

I think this is pretty much what Mike Keesey does on

  http://namesonnodes.org/texttree/

Anyone see a flaw in my reasoning?

Pj.



More information about the BioRuby mailing list