[Bioperl-l] PAML/Codeml parsing

Stefan Kirov stefan.kirov at bms.com
Wed Dec 5 14:35:23 UTC 2007


Here are the files.
Stefan
Stefan Kirov wrote:
> Jason,
> When there is a gapless alignment we have a differently formatted output
> from codeml:
> kirovs at horta:~/AESIG> head -n 10 feJRfxQl8D/mlc
>
> seed used = 492211105
>       3    141
>
> ENSRNOE00000058637               GCG AGC AAG TGT GAC AGC CAT GGC ACC CAC
> CTA GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC AGT CTG
> CAC AGT CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC ACC CTC ATA
> ENSMUSE00000366347               GCG AGC AAG TGT GAC AGC CAC GGC ACC CAC
> CTG GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC AGC CTG
> CAC AGC CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC ACC CTC ATA
> ENSE00001279150                  GCC AGC AAG TGT GAC AGT CAT GGC ACC CAC
> CTG GCA GGG GTG GTC AGC GGC CGG GAT GCC GGC GTG GCC AAG GGT GCC AGC ATG
> CGC AGC CTG CGC GTG CTC AAC TGC CAA GGG AAG GGC ACG GTT AGC GGC ACC CTC ATA
>
> And parsing this fails...
> The next one has gaps and works fine:
>
> kirovs at horta:~/AESIG> head -n 10 4z6ZX7s1B6/mlc
>
> seed used = 492252697
>
> Before deleting alignment gaps
>       2    162
>
> ENSMUSE00000460297               AAT ATC GAT ACA TTT TAC AAG GAG GCA GAA
> AAG AAG CTT ATA CAC GTG CTT GAG GGA GAC AGT CCC AAG TGG TCC ACA CCG AAC
> AAA GAC CCC ACC CGA GAG CCC CAT GCA GCC TCC ACT TGC TGT GCT TCA GAT CTC
> CTT GGT TCA GGA GGT CAG TTC CTG
> ENSE00000939192                  AAT ATT GAC ATA CTT TGC AAT GAA GCA GAA
> AAC AAG CTT ATG CAT ATA CTG CAT GCA AAT GAT CCC AAG TGG TCC ACC CCA ACT
> AAA GAC TGT ACT TCA GGG CCG TAC ACT GCT CAA ATC --- --- --- --- --- ATT
> CCT GGT ACA GGA AAC AAG CTT CTG
>
> I will send both whole files as an attachment with another mail (I do
> not know if these are going to pass through).
> My guess is that the whole _parse_summary method has to be re-worked as
> there is no tag to look for before the sequences start. Ugly.
> I am not sure what else could become broken if I try to fix it, so I
> will leave it to you.
> Stefan
>   
>> should be fixed.
>>
>> $ cvs log -r HEAD Bio/Tools/Phylo/PAML.pm
>> revision 1.56
>> date: 2007/11/01 14:52:56;  author: jason;  state: Exp;  lines: +21 -14
>> Parsing PAML4 and PAML3.15 should work now.  Dealing with variable
>> order for the sequences and summary results in
>> the top of the MLC files
>>
>> On Dec 4, 2007, at 11:25 AM, Stefan Kirov wrote:
>>
>>     
>>> Jason Stajich wrote:
>>>       
>>>> PAML4 breaks our PAML parser right now because the order of things in
>>>> the result file has changed.  Now sequences precede the information
>>>> about the version or the program run.  This means that $result-
>>>>         
>>>>> get_seqs() fails because we don't parse the sequences.
>>>>>           
>>>> We'll see what we can do, but as usual with supporting 3rd party
>>>> programs it is brittle when file formats change.  Th
>>>>
>>>> -jason
>>>>
>>>> -- 
>>>> Jason Stajich
>>>> jason at bioperl.org
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>>         
>>> Jason,
>>> I saw a commit after this post on codeml, but not on PAML.pm- I assume
>>> this is not fixed, am I correct?
>>> Thanks!
>>> Stefan
>>>       
>>     
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   

-------------- next part --------------
A non-text attachment was scrubbed...
Name: mlc.tar.gz
Type: application/x-gzip
Size: 3237 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20071205/bd77cde1/attachment-0004.gz>


More information about the Bioperl-l mailing list