[Biopython] Codeml parser in Biopython?

natassa natassa_g_2000 at yahoo.com
Tue Sep 14 08:02:18 UTC 2010


Hi Peter, 




Could you post a short example of the kind of output you are looking at?
Here is an example output, but this caan differ depending on the model used 
(there are several models for Branch, Site, BranchSite, but all are pretty 
standard)


-------------------------------------------------------------------------------OUTPUT-------------------------


seed used = 808671289
CODONML (in paml version 4.4, January 2010)  align.phy
Model: One dN/dS ratio for branches
Codon frequency model: F3x4
Site-class models:  NearlyNeutral
ns =   7  ls = 861

Codon usage in sequences
--------------------------------------------------------------------------------------------------------------

Phe TTT 12 14 15 14 14 12 | Ser TCT  6 11 12  8 10  6 | Tyr TAT  5  5  4  7  9  
5 | Cys TGT 11  8 10  9 11  8
    TTC 23 18 18 20 20 20 |     TCC 16 13 16 19 16 18 |     TAC 11 12 13 17 11 
13 |     TGC  6  2  6  6  4  6
Leu TTA  8  5  6  5  4  2 |     TCA 17 16 18 20 21 15 | *** TAA  0  0  0  0  0  
0 | *** TGA  0  0  0  0  0  0
    TTG 13 11 11 15 15 17 |     TCG 17 14 14 17 17 18 |     TAG  0  0  0  0  0  
0 | Trp TGG  9  8  8 11  8  7
--------------------------------------------------------------------------------------------------------------

Leu CTT 13 15 16 11 12 16 | Pro CCT  7  7 10  6 10  8 | His CAT  8  7  8  4  6  
5 | Arg CGT  6  4  5  4  5  5
    CTC 14 14 13 19 14 15 |     CCC 20 13 16 24 19 20 |     CAC 23 18 22 20 24 
17 |     CGC 14 13 15 14 14 15
    CTA  6  4  8  7  6  9 |     CCA 19 18 19 11 17 16 | Gln CAA 20 16 20 21 18 
13 |     CGA  8  4  6  5  6  6
    CTG 17 17 14 14 16 10 |     CCG  7  8  8  9  6  8 |     CAG 18 14 15 14 14 
13 |     CGG  7  7  8  9  9  8
--------------------------------------------------------------------------------------------------------------

Ile ATT  6  7  9  5  7  6 | Thr ACT  5  7  7  7  5  4 | Asn AAT  3  3  4  2  5  
2 | Ser AGT  7  7  9  8  7  7
    ATC 16 13 15 23 14 16 |     ACC 21 14 17 20 20 16 |     AAC 12 14 14 21 14 
11 |     AGC 14 13 14 15 11 10
    ATA 13  9 10 11 11 10 |     ACA 19 17 22 22 28 18 | Lys AAA 17  8 13  9 13 
12 | Arg AGA 11  5  8  4  6  5
Met ATG 23 21 23 22 23 20 |     ACG 11 12 12 12 14 13 |     AAG 18 15 19 19 18 
18 |     AGG  9 10 13 14 12 13
--------------------------------------------------------------------------------------------------------------

Val GTT  8 13 10 10 10  6 | Ala GCT 13 10 12 12 14 13 | Asp GAT 18 18 17 15 15 
17 | Gly GGT 13  7 12 10 11 10
    GTC 18 13 18 20 19 21 |     GCC 28 26 28 28 28 23 |     GAC 29 21 26 33 29 
30 |     GGC  9  9  8  7 12  8
    GTA  8  8  9  7  6  7 |     GCA 22 22 24 17 23 19 | Glu GAA 27 24 24 27 21 
22 |     GGA  7  7 10  9  7  9
    GTG 13 11 14 13 13  9 |     GCG 11 10 10 10  7  7 |     GAG 14 14 17 13 19 
17 |     GGG  7  6  9  8  7  9
--------------------------------------------------------------------------------------------------------------


--------------------------------------------------
Phe TTT 12 | Ser TCT  8 | Tyr TAT  6 | Cys TGT  8
    TTC 22 |     TCC 18 |     TAC 15 |     TGC  6
Leu TTA  5 |     TCA 22 | *** TAA  0 | *** TGA  0
    TTG 17 |     TCG 17 |     TAG  0 | Trp TGG  9
--------------------------------------------------
Leu CTT 14 | Pro CCT 12 | His CAT  5 | Arg CGT  6
    CTC 19 |     CCC 20 |     CAC 20 |     CGC 13
    CTA 10 |     CCA 16 | Gln CAA 17 |     CGA  5
    CTG  8 |     CCG 11 |     CAG 15 |     CGG  8
--------------------------------------------------
Ile ATT  5 | Thr ACT  4 | Asn AAT  4 | Ser AGT  7
    ATC 20 |     ACC 21 |     AAC 14 |     AGC 12
    ATA 11 |     ACA 29 | Lys AAA 11 | Arg AGA  4
Met ATG 25 |     ACG 15 |     AAG 23 |     AGG 13
--------------------------------------------------
Val GTT 10 | Ala GCT 13 | Asp GAT 16 | Gly GGT  7
    GTC 18 |     GCC 26 |     GAC 33 |     GGC 11
    GTA  7 |     GCA 24 | Glu GAA 23 |     GGA 11
    GTG 10 |     GCG  8 |     GAG 15 |     GGG 11
--------------------------------------------------

Codon position x base (3x4) table for each sequence.

#1: species1       
position  1:    T:0.18989    C:0.25524    A:0.25277    G:0.30210
position  2:    T:0.26017    C:0.29470    A:0.27497    G:0.17016
position  3:    T:0.17386    C:0.33785    A:0.24908    G:0.23921
Average         T:0.20797    C:0.29593    A:0.25894    G:0.23716

#2: species2         
position  1:    T:0.19296    C:0.25211    A:0.24648    G:0.30845
position  2:    T:0.27183    C:0.30704    A:0.26620    G:0.15493
position  3:    T:0.20141    C:0.31831    A:0.22958    G:0.25070
Average         T:0.22207    C:0.29249    A:0.24742    G:0.23803

#3: species3    
position  1:    T:0.18619    C:0.25031    A:0.25771    G:0.30580
position  2:    T:0.25771    C:0.30210    A:0.26634    G:0.17386
position  3:    T:0.19729    C:0.31936    A:0.24291    G:0.24044
Average         T:0.21373    C:0.29059    A:0.25565    G:0.24003

#4: species4   
position  1:    T:0.20664    C:0.23616    A:0.26322    G:0.29397
position  2:    T:0.26568    C:0.29766    A:0.27306    G:0.16359
position  3:    T:0.16236    C:0.37638    A:0.21525    G:0.24600
Average         T:0.21156    C:0.30340    A:0.25051    G:0.23452

#5: species5       
position  1:    T:0.19876    C:0.24348    A:0.25839    G:0.29938
position  2:    T:0.25342    C:0.31677    A:0.26832    G:0.16149
position  3:    T:0.18758    C:0.33416    A:0.23230    G:0.24596
Average         T:0.21325    C:0.29814    A:0.25300    G:0.23561

#6: species6      
position  1:    T:0.19892    C:0.24899    A:0.24493    G:0.30717
position  2:    T:0.26522    C:0.30041    A:0.26387    G:0.17050
position  3:    T:0.17591    C:0.35047    A:0.22057    G:0.25304
Average         T:0.21335    C:0.29995    A:0.24312    G:0.24357

#7: species7      
position  1:    T:0.20000    C:0.24121    A:0.26424    G:0.29455
position  2:    T:0.25818    C:0.32000    A:0.26303    G:0.15879
position  3:    T:0.16606    C:0.34909    A:0.23636    G:0.24848
Average         T:0.20808    C:0.30343    A:0.25455    G:0.23394

Sums of codon usage counts
------------------------------------------------------------------------------
Phe F TTT      93 | Ser S TCT      61 | Tyr Y TAT      41 | Cys C TGT      65
      TTC     141 |       TCC     116 |       TAC      92 |       TGC      36
Leu L TTA      35 |       TCA     129 | *** * TAA       0 | *** * TGA       0
      TTG      99 |       TCG     114 |       TAG       0 | Trp W TGG      60
------------------------------------------------------------------------------
Leu L CTT      97 | Pro P CCT      60 | His H CAT      43 | Arg R CGT      35
      CTC     108 |       CCC     132 |       CAC     144 |       CGC      98
      CTA      50 |       CCA     116 | Gln Q CAA     125 |       CGA      40
      CTG      96 |       CCG      57 |       CAG     103 |       CGG      56
------------------------------------------------------------------------------
Ile I ATT      45 | Thr T ACT      39 | Asn N AAT      23 | Ser S AGT      52
      ATC     117 |       ACC     129 |       AAC     100 |       AGC      89
      ATA      75 |       ACA     155 | Lys K AAA      83 | Arg R AGA      43
Met M ATG     157 |       ACG      89 |       AAG     130 |       AGG      84
------------------------------------------------------------------------------
Val V GTT      67 | Ala A GCT      87 | Asp D GAT     116 | Gly G GGT      70
      GTC     127 |       GCC     187 |       GAC     201 |       GGC      64
      GTA      52 |       GCA     151 | Glu E GAA     168 |       GGA      60
      GTG      83 |       GCG      63 |       GAG     109 |       GGG      57
------------------------------------------------------------------------------

(Ambiguity data are not used in the counts.)


Codon position x base (3x4) table, overall

position  1:    T:0.19623    C:0.24664    A:0.25571    G:0.30141
position  2:    T:0.26152    C:0.30559    A:0.26804    G:0.16485
position  3:    T:0.18027    C:0.34113    A:0.23250    G:0.24610
Average         T:0.21267    C:0.29779    A:0.25209    G:0.23746


Nei & Gojobori 1986. dN/dS (dN, dS)
(Pairwise deletion)
(Note: This matrix is not used in later ML. analysis.
Use runmode = -2 for ML pairwise comparison.)

species1            
species2               0.2598 (0.0599 0.2306)
species3          0.2532 (0.0528 0.2085) 0.2778 (0.0189 0.0680)
species4         0.2815 (0.1116 0.3966) 0.1905 (0.0738 0.3873) 0.2555 (0.0981 
0.3838)
species5             0.2780 (0.0654 0.2351) 0.2611 (0.0631 0.2419) 0.2487 
(0.0552 0.2221) 0.2993 (0.0908 0.3034)
species6            0.2041 (0.0693 0.3396) 0.1785 (0.0613 0.3437) 0.2147 (0.0644 
0.2997) 0.2510 (0.0598 0.2384) 0.2261 (0.0511 0.2260)
species7            0.2374 (0.0890 0.3748) 0.2080 (0.0819 0.3935) 0.2272 (0.0787 
0.3465) 0.2415 (0.0676 0.2797) 0.2646 (0.0731 0.2764) 0.1821 (0.0176 0.0967)


TREE #  1:  (((1, (2, 3)), 5), (6, 4), 7);   MP score: -1
lnL(ntime: 11  np: 14):  -7469.732728      +0.000000
   8..9     9..10   10..1    10..11   11..2    11..3     9..5     8..12   
12..6    12..4     8..7  

 0.179837 0.082919 0.172587 0.087525 0.067422 0.032013 0.124010 0.001030 
0.062291 0.297695 0.117429 2.800021 0.731929 0.083728

Note: Branch length is defined as number of nucleotide substitutions per codon 
(not per neucleotide site).

tree length =   1.22476

(((1: 0.172587, (2: 0.067422, 3: 0.032013): 0.087525): 0.082919, 5: 0.124010): 
0.179837, (6: 0.062291, 4: 0.297695): 0.001030, 7: 0.117429);

(((species1: 0.172587, (species2: 0.067422, species3: 0.032013): 0.087525): 
0.082919, species5: 0.124010): 0.179837, (species6: 0.062291, species4: 
0.297695): 0.001030, species7: 0.117429);

Detailed output identifying parameters

kappa (ts/tv) =  2.80002


dN/dS (w) for site classes (K=2)

p:   0.73193  0.26807
w:   0.08373  1.00000

dN & dS for each branch

 branch          t       N       S   dN/dS      dN      dS  N*dN  S*dS

   8..9       0.180   1857.3    725.7   0.3294   0.0381   0.1158   70.8   84.0
   9..10      0.083   1857.3    725.7   0.3294   0.0176   0.0534   32.7   38.7
  10..1       0.173   1857.3    725.7   0.3294   0.0366   0.1111   68.0   80.6
  10..11      0.088   1857.3    725.7   0.3294   0.0186   0.0563   34.5   40.9
  11..2       0.067   1857.3    725.7   0.3294   0.0143   0.0434   26.6   31.5
  11..3       0.032   1857.3    725.7   0.3294   0.0068   0.0206   12.6   15.0
   9..5       0.124   1857.3    725.7   0.3294   0.0263   0.0798   48.8   57.9
   8..12      0.001   1857.3    725.7   0.3294   0.0002   0.0007    0.4    0.5
  12..6       0.062   1857.3    725.7   0.3294   0.0132   0.0401   24.5   29.1
  12..4       0.298   1857.3    725.7   0.3294   0.0631   0.1917  117.2  139.1
   8..7       0.117   1857.3    725.7   0.3294   0.0249   0.0756   46.2   54.9


Time used:  0:10

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Can you get codeml to output what you need in another format, such as NEXUS?

Haven't tried that, but as you can see, this is a very verbose output and NEXUS 
does not seem an option. 

Ultimately, I want to parse this to get all the information I need in a 
tabulated file. I am still working out what exactly I need (there are standard 
values to get out, as LnL, branch length, Dn/Ds, but it also depends on the type 
of downstram analysis). I will now work on the pypaml class and modify the 
original code to make it more generic (it seems that it only works for Site 
Models). 

Will let you know, was just wondering if there was already a solution.There is 
one in Bioperl, but heard it is very slow and in any case, I don't understand 
much of perl....
Thanks, 
Anastasia


      



More information about the Biopython mailing list