<html>
<head>
<style>
body {
font-family: Verdana, sans-serif;
font-size: 0.8em;
color:#484848;
}
h1, h2, h3 { font-family: "Trebuchet MS", Verdana, sans-serif; margin: 0px; }
h1 { font-size: 1.2em; }
h2, h3 { font-size: 1.1em; }
a, a:link, a:visited { color: #2A5685;}
a:hover, a:active { color: #c61a1a; }
a.wiki-anchor { display: none; }
fieldset.attachments {border-width: 1px 0 0 0;}
hr {
width: 100%;
height: 1px;
background: #ccc;
border: 0;
}
span.footer {
font-size: 0.8em;
font-style: italic;
}
</style>
</head>
<body>
Issue #2693 has been updated by Vincent Davis.
<ul>
<li><strong>Description</strong> updated (<a title="View differences" href="https://redmine.open-bio.org/journals/diff/15346?detail_id=1619">diff</a>)</li>
<li><strong>Assignee</strong> changed from <i>Biopython Dev Mailing List</i> to <i>Vincent Davis</i></li>
</ul>
<hr />
<h1><a href="https://redmine.open-bio.org/issues/2693#change-15346">Bug #2693: LogisticRegression convergence criterion is too lenient</a></h1>
<ul><li>Author: Bruce Southey</li>
<li>Status: New</li>
<li>Priority: Normal</li>
<li>Assignee: Vincent Davis</li>
<li>Category: Main Distribution</li>
<li>Target version: Not Applicable</li>
<li>URL: </li></ul>
<p>In R and SAS, the example in the code and tutorial provides the following parameters:</p>
<p>Intercept = 18.9622<br />x1 = -0.0714<br />x2 = 0.0444</p>
<p>By default, Bio/LogisticRegression.py defines the following parameters<br /> MAX_ITERATIONS = 500<br /> CONVERGE_THRESHOLD = 0.01</p>
<p>The convergence threshold is too lenient so the iterations terminate before the expected values are obtained. Using more stringent criteria (CONVERGE_THRESHOLD = 0.000000001) permits convergence to the R/SAS values provided MAX_ITERATIONS is greater than 7761 with my system.</p>
<p>MAX_ITERATIONS and CONVERGE_THRESHOLD are fixed within Bio/LogisticRegression.py module but should be part of the API for the train function such as:<br />def train(xs, ys, update_fn=None, typecode=None, CONVERGE_THRESHOLD = 0.000000001, MAX_ITERATIONS=10000):</p>
<p>Note the algorithm used requires a large number of iterations and the train function does not display the degree of convergence attained when MAX_ITERATIONS is exceeded.</p>
<p>Jeffrey Whitaker provides Python code using an alternative algorithm: <br /><a class="external" href="http://www.cdc.noaa.gov/people/jeffrey.s.whitaker/python/logistic_regression.py">http://www.cdc.noaa.gov/people/jeffrey.s.whitaker/python/logistic_regression.py</a></p>
<p>Furthermore, the update_fn should also pass the previous likelihood or difference is likelihood so the actual convergence can be seen. Really the update_fn should be more general than this and be able to display more information but the attached patches provides the previous llh (old_llik).<br />def show_progress(iteration, old_llh, loglikelihood):<br /> print "Iteration:", iteration, "Old", old_llh, "Log-likelihood function:", loglikelihood, "Diff:", (old_llh-loglikelihood)</p>
<p>model = LogisticRegression.train(xs, ys, update_fn=show_progress)</p>
<fieldset class="attachments"><legend>Files</legend>
<a href="https://redmine.open-bio.org/attachments/download/1134/logreg.diff">logreg.diff</a>
(2.34 KB)<br />
</fieldset>
<hr />
<span class="footer"><p>You have received this notification because you have either subscribed to it, or are involved in it.<br />To change your notification preferences, please click here and login: <a class="external" href="http://redmine.open-bio.org">http://redmine.open-bio.org</a></p></span>
</body>
</html>