[Biopython] Codeml parser in Biopython?
natassa
natassa_g_2000 at yahoo.com
Tue Sep 14 08:02:18 UTC 2010
Hi Peter,
Could you post a short example of the kind of output you are looking at?
Here is an example output, but this caan differ depending on the model used
(there are several models for Branch, Site, BranchSite, but all are pretty
standard)
-------------------------------------------------------------------------------OUTPUT-------------------------
seed used = 808671289
CODONML (in paml version 4.4, January 2010) align.phy
Model: One dN/dS ratio for branches
Codon frequency model: F3x4
Site-class models: NearlyNeutral
ns = 7 ls = 861
Codon usage in sequences
--------------------------------------------------------------------------------------------------------------
Phe TTT 12 14 15 14 14 12 | Ser TCT 6 11 12 8 10 6 | Tyr TAT 5 5 4 7 9
5 | Cys TGT 11 8 10 9 11 8
TTC 23 18 18 20 20 20 | TCC 16 13 16 19 16 18 | TAC 11 12 13 17 11
13 | TGC 6 2 6 6 4 6
Leu TTA 8 5 6 5 4 2 | TCA 17 16 18 20 21 15 | *** TAA 0 0 0 0 0
0 | *** TGA 0 0 0 0 0 0
TTG 13 11 11 15 15 17 | TCG 17 14 14 17 17 18 | TAG 0 0 0 0 0
0 | Trp TGG 9 8 8 11 8 7
--------------------------------------------------------------------------------------------------------------
Leu CTT 13 15 16 11 12 16 | Pro CCT 7 7 10 6 10 8 | His CAT 8 7 8 4 6
5 | Arg CGT 6 4 5 4 5 5
CTC 14 14 13 19 14 15 | CCC 20 13 16 24 19 20 | CAC 23 18 22 20 24
17 | CGC 14 13 15 14 14 15
CTA 6 4 8 7 6 9 | CCA 19 18 19 11 17 16 | Gln CAA 20 16 20 21 18
13 | CGA 8 4 6 5 6 6
CTG 17 17 14 14 16 10 | CCG 7 8 8 9 6 8 | CAG 18 14 15 14 14
13 | CGG 7 7 8 9 9 8
--------------------------------------------------------------------------------------------------------------
Ile ATT 6 7 9 5 7 6 | Thr ACT 5 7 7 7 5 4 | Asn AAT 3 3 4 2 5
2 | Ser AGT 7 7 9 8 7 7
ATC 16 13 15 23 14 16 | ACC 21 14 17 20 20 16 | AAC 12 14 14 21 14
11 | AGC 14 13 14 15 11 10
ATA 13 9 10 11 11 10 | ACA 19 17 22 22 28 18 | Lys AAA 17 8 13 9 13
12 | Arg AGA 11 5 8 4 6 5
Met ATG 23 21 23 22 23 20 | ACG 11 12 12 12 14 13 | AAG 18 15 19 19 18
18 | AGG 9 10 13 14 12 13
--------------------------------------------------------------------------------------------------------------
Val GTT 8 13 10 10 10 6 | Ala GCT 13 10 12 12 14 13 | Asp GAT 18 18 17 15 15
17 | Gly GGT 13 7 12 10 11 10
GTC 18 13 18 20 19 21 | GCC 28 26 28 28 28 23 | GAC 29 21 26 33 29
30 | GGC 9 9 8 7 12 8
GTA 8 8 9 7 6 7 | GCA 22 22 24 17 23 19 | Glu GAA 27 24 24 27 21
22 | GGA 7 7 10 9 7 9
GTG 13 11 14 13 13 9 | GCG 11 10 10 10 7 7 | GAG 14 14 17 13 19
17 | GGG 7 6 9 8 7 9
--------------------------------------------------------------------------------------------------------------
--------------------------------------------------
Phe TTT 12 | Ser TCT 8 | Tyr TAT 6 | Cys TGT 8
TTC 22 | TCC 18 | TAC 15 | TGC 6
Leu TTA 5 | TCA 22 | *** TAA 0 | *** TGA 0
TTG 17 | TCG 17 | TAG 0 | Trp TGG 9
--------------------------------------------------
Leu CTT 14 | Pro CCT 12 | His CAT 5 | Arg CGT 6
CTC 19 | CCC 20 | CAC 20 | CGC 13
CTA 10 | CCA 16 | Gln CAA 17 | CGA 5
CTG 8 | CCG 11 | CAG 15 | CGG 8
--------------------------------------------------
Ile ATT 5 | Thr ACT 4 | Asn AAT 4 | Ser AGT 7
ATC 20 | ACC 21 | AAC 14 | AGC 12
ATA 11 | ACA 29 | Lys AAA 11 | Arg AGA 4
Met ATG 25 | ACG 15 | AAG 23 | AGG 13
--------------------------------------------------
Val GTT 10 | Ala GCT 13 | Asp GAT 16 | Gly GGT 7
GTC 18 | GCC 26 | GAC 33 | GGC 11
GTA 7 | GCA 24 | Glu GAA 23 | GGA 11
GTG 10 | GCG 8 | GAG 15 | GGG 11
--------------------------------------------------
Codon position x base (3x4) table for each sequence.
#1: species1
position 1: T:0.18989 C:0.25524 A:0.25277 G:0.30210
position 2: T:0.26017 C:0.29470 A:0.27497 G:0.17016
position 3: T:0.17386 C:0.33785 A:0.24908 G:0.23921
Average T:0.20797 C:0.29593 A:0.25894 G:0.23716
#2: species2
position 1: T:0.19296 C:0.25211 A:0.24648 G:0.30845
position 2: T:0.27183 C:0.30704 A:0.26620 G:0.15493
position 3: T:0.20141 C:0.31831 A:0.22958 G:0.25070
Average T:0.22207 C:0.29249 A:0.24742 G:0.23803
#3: species3
position 1: T:0.18619 C:0.25031 A:0.25771 G:0.30580
position 2: T:0.25771 C:0.30210 A:0.26634 G:0.17386
position 3: T:0.19729 C:0.31936 A:0.24291 G:0.24044
Average T:0.21373 C:0.29059 A:0.25565 G:0.24003
#4: species4
position 1: T:0.20664 C:0.23616 A:0.26322 G:0.29397
position 2: T:0.26568 C:0.29766 A:0.27306 G:0.16359
position 3: T:0.16236 C:0.37638 A:0.21525 G:0.24600
Average T:0.21156 C:0.30340 A:0.25051 G:0.23452
#5: species5
position 1: T:0.19876 C:0.24348 A:0.25839 G:0.29938
position 2: T:0.25342 C:0.31677 A:0.26832 G:0.16149
position 3: T:0.18758 C:0.33416 A:0.23230 G:0.24596
Average T:0.21325 C:0.29814 A:0.25300 G:0.23561
#6: species6
position 1: T:0.19892 C:0.24899 A:0.24493 G:0.30717
position 2: T:0.26522 C:0.30041 A:0.26387 G:0.17050
position 3: T:0.17591 C:0.35047 A:0.22057 G:0.25304
Average T:0.21335 C:0.29995 A:0.24312 G:0.24357
#7: species7
position 1: T:0.20000 C:0.24121 A:0.26424 G:0.29455
position 2: T:0.25818 C:0.32000 A:0.26303 G:0.15879
position 3: T:0.16606 C:0.34909 A:0.23636 G:0.24848
Average T:0.20808 C:0.30343 A:0.25455 G:0.23394
Sums of codon usage counts
------------------------------------------------------------------------------
Phe F TTT 93 | Ser S TCT 61 | Tyr Y TAT 41 | Cys C TGT 65
TTC 141 | TCC 116 | TAC 92 | TGC 36
Leu L TTA 35 | TCA 129 | *** * TAA 0 | *** * TGA 0
TTG 99 | TCG 114 | TAG 0 | Trp W TGG 60
------------------------------------------------------------------------------
Leu L CTT 97 | Pro P CCT 60 | His H CAT 43 | Arg R CGT 35
CTC 108 | CCC 132 | CAC 144 | CGC 98
CTA 50 | CCA 116 | Gln Q CAA 125 | CGA 40
CTG 96 | CCG 57 | CAG 103 | CGG 56
------------------------------------------------------------------------------
Ile I ATT 45 | Thr T ACT 39 | Asn N AAT 23 | Ser S AGT 52
ATC 117 | ACC 129 | AAC 100 | AGC 89
ATA 75 | ACA 155 | Lys K AAA 83 | Arg R AGA 43
Met M ATG 157 | ACG 89 | AAG 130 | AGG 84
------------------------------------------------------------------------------
Val V GTT 67 | Ala A GCT 87 | Asp D GAT 116 | Gly G GGT 70
GTC 127 | GCC 187 | GAC 201 | GGC 64
GTA 52 | GCA 151 | Glu E GAA 168 | GGA 60
GTG 83 | GCG 63 | GAG 109 | GGG 57
------------------------------------------------------------------------------
(Ambiguity data are not used in the counts.)
Codon position x base (3x4) table, overall
position 1: T:0.19623 C:0.24664 A:0.25571 G:0.30141
position 2: T:0.26152 C:0.30559 A:0.26804 G:0.16485
position 3: T:0.18027 C:0.34113 A:0.23250 G:0.24610
Average T:0.21267 C:0.29779 A:0.25209 G:0.23746
Nei & Gojobori 1986. dN/dS (dN, dS)
(Pairwise deletion)
(Note: This matrix is not used in later ML. analysis.
Use runmode = -2 for ML pairwise comparison.)
species1
species2 0.2598 (0.0599 0.2306)
species3 0.2532 (0.0528 0.2085) 0.2778 (0.0189 0.0680)
species4 0.2815 (0.1116 0.3966) 0.1905 (0.0738 0.3873) 0.2555 (0.0981
0.3838)
species5 0.2780 (0.0654 0.2351) 0.2611 (0.0631 0.2419) 0.2487
(0.0552 0.2221) 0.2993 (0.0908 0.3034)
species6 0.2041 (0.0693 0.3396) 0.1785 (0.0613 0.3437) 0.2147 (0.0644
0.2997) 0.2510 (0.0598 0.2384) 0.2261 (0.0511 0.2260)
species7 0.2374 (0.0890 0.3748) 0.2080 (0.0819 0.3935) 0.2272 (0.0787
0.3465) 0.2415 (0.0676 0.2797) 0.2646 (0.0731 0.2764) 0.1821 (0.0176 0.0967)
TREE # 1: (((1, (2, 3)), 5), (6, 4), 7); MP score: -1
lnL(ntime: 11 np: 14): -7469.732728 +0.000000
8..9 9..10 10..1 10..11 11..2 11..3 9..5 8..12
12..6 12..4 8..7
0.179837 0.082919 0.172587 0.087525 0.067422 0.032013 0.124010 0.001030
0.062291 0.297695 0.117429 2.800021 0.731929 0.083728
Note: Branch length is defined as number of nucleotide substitutions per codon
(not per neucleotide site).
tree length = 1.22476
(((1: 0.172587, (2: 0.067422, 3: 0.032013): 0.087525): 0.082919, 5: 0.124010):
0.179837, (6: 0.062291, 4: 0.297695): 0.001030, 7: 0.117429);
(((species1: 0.172587, (species2: 0.067422, species3: 0.032013): 0.087525):
0.082919, species5: 0.124010): 0.179837, (species6: 0.062291, species4:
0.297695): 0.001030, species7: 0.117429);
Detailed output identifying parameters
kappa (ts/tv) = 2.80002
dN/dS (w) for site classes (K=2)
p: 0.73193 0.26807
w: 0.08373 1.00000
dN & dS for each branch
branch t N S dN/dS dN dS N*dN S*dS
8..9 0.180 1857.3 725.7 0.3294 0.0381 0.1158 70.8 84.0
9..10 0.083 1857.3 725.7 0.3294 0.0176 0.0534 32.7 38.7
10..1 0.173 1857.3 725.7 0.3294 0.0366 0.1111 68.0 80.6
10..11 0.088 1857.3 725.7 0.3294 0.0186 0.0563 34.5 40.9
11..2 0.067 1857.3 725.7 0.3294 0.0143 0.0434 26.6 31.5
11..3 0.032 1857.3 725.7 0.3294 0.0068 0.0206 12.6 15.0
9..5 0.124 1857.3 725.7 0.3294 0.0263 0.0798 48.8 57.9
8..12 0.001 1857.3 725.7 0.3294 0.0002 0.0007 0.4 0.5
12..6 0.062 1857.3 725.7 0.3294 0.0132 0.0401 24.5 29.1
12..4 0.298 1857.3 725.7 0.3294 0.0631 0.1917 117.2 139.1
8..7 0.117 1857.3 725.7 0.3294 0.0249 0.0756 46.2 54.9
Time used: 0:10
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Can you get codeml to output what you need in another format, such as NEXUS?
Haven't tried that, but as you can see, this is a very verbose output and NEXUS
does not seem an option.
Ultimately, I want to parse this to get all the information I need in a
tabulated file. I am still working out what exactly I need (there are standard
values to get out, as LnL, branch length, Dn/Ds, but it also depends on the type
of downstram analysis). I will now work on the pypaml class and modify the
original code to make it more generic (it seems that it only works for Site
Models).
Will let you know, was just wondering if there was already a solution.There is
one in Bioperl, but heard it is very slow and in any case, I don't understand
much of perl....
Thanks,
Anastasia
More information about the Biopython
mailing list