<html>

<head>

<style>

body {

  font-family: Verdana, sans-serif;

  font-size: 0.8em;

  color:#484848;

}

h1, h2, h3 { font-family: "Trebuchet MS", Verdana, sans-serif; margin: 0px; }

h1 { font-size: 1.2em; }

h2, h3 { font-size: 1.1em; }

a, a:link, a:visited { color: #2A5685;}

a:hover, a:active { color: #c61a1a; }

a.wiki-anchor { display: none; }

fieldset.attachments {border-width: 1px 0 0 0;}

hr {

  width: 100%;

  height: 1px;

  background: #ccc;

  border: 0;

}

span.footer {

  font-size: 0.8em;

  font-style: italic;

}

</style>

</head>

<body>

Issue #2693 has been updated by Vincent Davis.

<ul>

</ul>

<p>This seems worth checking into more. I will create a github issue.</p>

<hr />

<h1><a href="https://redmine.open-bio.org/issues/2693#change-15334">Bug #2693: LogisticRegression convergence criterion is too lenient</a></h1>

<ul><li>Author: Bruce Southey</li>

<li>Status: New</li>

<li>Priority: Normal</li>

<li>Assignee: Biopython Dev Mailing List</li>

<li>Category: Main Distribution</li>

<li>Target version: Not Applicable</li>

<li>URL: </li></ul>

<p>In R and SAS, the example in the code and tutorial provides the following parameters:</p>

        <p>Intercept =  18.9622<br />x1        =  -0.0714<br />x2        =   0.0444</p>

        <p>By default, Bio/LogisticRegression.py defines the following parameters<br />    MAX_ITERATIONS = 500<br />    CONVERGE_THRESHOLD = 0.01</p>

        <p>The convergence threshold is too lenient so the iterations terminate before the expected values are obtained. Using more stringent criteria (CONVERGE_THRESHOLD = 0.000000001) permits convergence to the R/SAS values provided MAX_ITERATIONS is greater than 7761 with my system.</p>

        <p>MAX_ITERATIONS and CONVERGE_THRESHOLD are fixed within Bio/LogisticRegression.py module but should be part of the API for the train function such as:<br />def train(xs, ys, update_fn=None, typecode=None, CONVERGE_THRESHOLD = 0.000000001, MAX_ITERATIONS=10000):</p>

        <p>Note the algorithm used requires a large number of iterations and the train function does not display the degree of convergence attained when MAX_ITERATIONS is exceeded.</p>

        <p>Jeffrey Whitaker provides Python code using an alternative algorithm: <br /><a class="external" href="http://www.cdc.noaa.gov/people/jeffrey.s.whitaker/python/logistic_regression.py">http://www.cdc.noaa.gov/people/jeffrey.s.whitaker/python/logistic_regression.py</a></p>

        <p>Furthermore, the update_fn should also pass the previous likelihood or difference is likelihood so the actual convergence can be seen. Really the update_fn should be more general than this and be able to display more information but the attached patches provides the previous llh (old_llik).<br />def show_progress(iteration, old_llh, loglikelihood):<br />    print "Iteration:", iteration, "Old", old_llh, "Log-likelihood function:", loglikelihood, "Diff:", (old_llh-loglikelihood)</p>

        <p>model = LogisticRegression.train(xs, ys, update_fn=show_progress)</p>

  <fieldset class="attachments"><legend>Files</legend>

    <a href="https://redmine.open-bio.org/attachments/download/1134/logreg.diff">logreg.diff</a>

    (2.34 KB)<br />

  </fieldset>

<hr />

<span class="footer"><p>You have received this notification because you have either subscribed to it, or are involved in it.<br />To change your notification preferences, please click here and login: <a class="external" href="http://redmine.open-bio.org">http://redmine.open-bio.org</a></p></span>

</body>

</html>