[Bioperl-l] model 3 on Codeml.pm
Lorenzo Carretero Paulet
locarpau at upvnet.upv.es
Wed Jun 8 20:21:08 UTC 2011
Jason,
This is the exact error message:
--------------------- WARNING ---------------------
MSG: parameter model specified value 3 is not recognized, please see the
documentation and the code for this module or set the no_param_checks to
a true value
---------------------------------------------------
See below a version of the script I'm trying to run. It runs well for
model values 0,1 and 2 and, but retruns taht message with 3 (even after
editing codeml.pm). It is within a subroutine to which I pass aa and nt
sequences files as well as the tree file. Actually I'm only interested
in the resulting lnL of the data to do LRT comparisons. I ran PAML
manually and, with the same datasets, the lnL values for model 3 are not
the same as those returned when ran from the script.
Thanks,
Lorenzo
PS:
my %hashNTseqs = ();
my $dna_aln;
my $pep_aln;
my $lnL;
my $seq_ids;
#Create a Bio::SeqIO object with the pep sequences
my $inseq_pep = Bio::SeqIO->new(-file => "<$sequencesfilenameAA",
-format => $format );
# Remove the stop (*) from the end of every protein sequence
while (my $seqin = $inseq_pep->next_seq)
{
my $seq = $seqin->seq();
$seq =~ s/\*//g;
}
#Create a Bio::SeqIO object with the nt sequences
my $inseq_nt = Bio::SeqIO->new(-file => "<$sequencesfilenameNT",
-format => $format );
#Get an alignment of protein sequences
my $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new ();
#my $aln_factory = Bio::Tools::Run::Alignment::Muscle->new ();
#say "Aligning peptide sequences globally (clustalw) ...";
$pep_aln = $aln_factory->align($sequencesfilenameAA);
while (my $seqin = $inseq_nt->next_seq)
{
my $seq = $seqin->seq();
# Replace all Xs and missing characters (?) with Ns
$seq =~ s/X/N/gi;
$seq =~ s/\?/N/g;
my $seq_id = $seqin->display_id();
# Create a reference to a hash with keys : display_ids for the aa sequences in the alignment
# and the values are a Bio::PrimarySeqI object for the corresponding spliced cDNA sequence.
$hashNTseqs{$seq_id} = $seqin;
}
#Generating codon alignment on the basis of the corresponding peptide one
#say "Aligning codon sequences according to corresponding peptide MSA ...";
$dna_aln = aa_to_dna_aln($pep_aln, \%hashNTseqs);
#Remove all positions containing a codon gap from the codon alignment???
#say "positions: ",$dna_aln->num_residues();
$dna_aln = $dna_aln-> remove_columns(['gaps']);
#say "positions: ",$dna_aln->num_residues();
#Generating tree from input string of ids
my $io = IO::String->new($tree);
my $biotreeio = Bio::TreeIO->new(-fh => $io,
-format => 'newick');
my $biotree = $biotreeio->next_tree;
my $codeml_factory = new Bio::Tools::Run::Phylo::PAML::Codeml
(
-alignment => $dna_aln,
-tree => $biotree,
-params => {
#'verbose' => 0,
#'noisy' => 9,
'runmode' => 0, #user tree
'seqtype' => 1,
'model' => 3,
'NSsites' => 3,
'fix_omega' => 0,
'omega' => 0,
'ncatG' => 2,
#'icode' => 0,
#'fix_alpha' => 0,
#'fix_kappa' => 0,
#'RateAncestor' => 0,
'CodonFreq' => 2,
'cleandata' => 1,
'ndata' => 1
},
);
#$codeml_factory->alignment($dna_aln);
#$codeml_factory->tree($biotree);
#$codeml_factory->set_parameter('model',2);
my ($rc,$parser) = $codeml_factory->run(); # or run($dna_aln,$biotree)
$codeml_factory->cleanup();
my $result = $parser->next_result();
#my $MLmatrix_free = $result_free->get_MLmatrix();
my $intree = $result->next_tree;
$lnL = $intree->score;
say "lnL=$lnL";
for my $node ( $intree->get_nodes )
{
my $id;
# first we do some work to figure out what the ID should be.
# for a leaf or tip node this is just the taxon label
if( $node->is_Leaf() )
{
$id = $node->id;
}
else
{
# for the internal nodes it is just the name of all the sub-nodes
# put together, much like how Sanderson represents internal nodes
# in r8s
$id = "(".join(",", map { $_->id } grep { $_->is_Leaf }$node->get_all_Descendents) .")";
}
if( ! $node->ancestor or ! $node->has_tag('t') )
{
# skip when no values have been associated with this node
# (like the root node)
next;
}
print join ("\t",$id,map { ($node->get_tag_values($_))[0] }qw(dN/dS)), "\n";
}
$result->reset_seqs;
El 08/06/11 22:04, Jason Stajich escribió:
> yes - as I said, a simple script that demonstrates the problem will make this all easier to debug. If you reported the actual error message that might be a good start. The msg before refers to a different parameter so I'm suspicious.
>
> On Jun 8, 2011, at 2:10 PM, Lorenzo Carretero Paulet wrote:
>
>> Jason,
>> I edited the Codeml.pm and posted the version in github. The only change I made was in line 275 'model' => [0..2,7] was changed to 'model' => [0..3,7] , so that 3 could be passed as valid value to model. However, I still got the same warning message, and I tested with different alignments and results are different from those obtained running PAML locally. Maybe there are additional changes to be made.
>> Lorenzo
>>
>> El 08/06/11 07:15, Jason Stajich escribió:
>>> it is in github so you can fork a version, make the change, and submit a patch which we can pick up.
>>>
>>> I am concerned that this change can't be tested without example code.
>>>
>>> Did you just edit the code and make sure your changes worked, the error message seems to refer to a different parameter
>>> (ncatG) not model.
>>>
>>> Thanks,
>>>
>>>> MSG: parameter ncatG specified value is not recognized, please see the documentation and the code for this module or set the no_param_checks to a true value.'
>>> On Jun 7, 2011, at 9:26 PM, Lorenzo Carretero wrote:
>>>
>>>> Hi,
>>>> I'm trying to run the clade model D (Model D: model = 3, NSsites = 3 ncatG = 2, See reference. Bielawski, J. P., and Z. Yang. 2004. A maximum likelihood method for detecting functional divergence at individual codon sites, with application to gene family evolution. Journal of Molecular Evolution 59:121-132. and PAML 4.4 manual page 30) from the Bio::Tools::Run::Phylo::PAML::Codeml module. However, 3 is not among the valid values that can be passed to the module (line 275 'model' => [0..2,7],) and consequently the following Warning message is returned from lines 689-690:
>>>> 'MSG: parameter ncatG specified value is not recognized, please see the documentation and the code for this module or set the no_param_checks to a true value.'
>>>> Can line 275: 'model' => [0..2,7], be changed to 'model' => [0..3,7], to accept value 3 or additional changes must be done in other modules to properly run the so-called clade models of PAML.
>>>> Thanks,
>>>> Lorenzo
>>>>
>>>>
>>>> --
>>>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
>>>> Lorenzo Carretero Paulet
>>>> Institute for Plant Molecular and Cell Biology - IBMCP (CSIC-UPV)
>>>> Integrative Systems Biology Group
>>>> C/ Ingeniero Fausto Elio s/n.
>>>> 46022 Valencia, Spain
>>>>
>>>> Phone: +34 963879934
>>>> Fax: +34 963877859
>>>> e-mail: locarpau at upvnet.upv.es
>>>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
>> Lorenzo Carretero Paulet
>> Institute for Plant Molecular and Cell Biology - IBMCP (CSIC-UPV)
>> Integrative Systems Biology Group
>> C/ Ingeniero Fausto Elio s/n.
>> 46022 Valencia, Spain
>>
>> Phone: +34 963879934
>> Fax: +34 963877859
>> e-mail: locarpau at upvnet.upv.es
>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
--
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
Lorenzo Carretero Paulet
Institute for Plant Molecular and Cell Biology - IBMCP (CSIC-UPV)
Integrative Systems Biology Group
C/ Ingeniero Fausto Elio s/n.
46022 Valencia, Spain
Phone: +34 963879934
Fax: +34 963877859
e-mail: locarpau at upvnet.upv.es
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
More information about the Bioperl-l
mailing list