<html>
<head>
<style>
body {
font-family: Verdana, sans-serif;
font-size: 0.8em;
color:#484848;
}
h1, h2, h3 { font-family: "Trebuchet MS", Verdana, sans-serif; margin: 0px; }
h1 { font-size: 1.2em; }
h2, h3 { font-size: 1.1em; }
a, a:link, a:visited { color: #2A5685;}
a:hover, a:active { color: #c61a1a; }
a.wiki-anchor { display: none; }
fieldset.attachments {border-width: 1px 0 0 0;}
hr {
width: 100%;
height: 1px;
background: #ccc;
border: 0;
}
span.footer {
font-size: 0.8em;
font-style: italic;
}
</style>
</head>
<body>
Issue #2176 has been updated by Vincent Davis.
<ul>
<li><strong>Description</strong> updated (<a title="View differences" href="https://redmine.open-bio.org/journals/diff/15360?detail_id=1657">diff</a>)</li>
<li><strong>Assignee</strong> changed from <i>Biopython Dev Mailing List</i> to <i>Vincent Davis</i></li>
</ul>
<p>Looks like the final commets where never done. That is remover "We could perhaps deprecate record.database_letters immediately, and at a later point, record.query_letters" <br />I see a TODO in the code to remove database_letters.<br /><a class="external" href="https://github.com/biopython/biopython/blob/589bc075ace24cb29bcd69c9789963df7febeb4d/Bio/Blast/NCBIXML.py#L188">https://github.com/biopython/biopython/blob/589bc075ace24cb29bcd69c9789963df7febeb4d/Bio/Blast/NCBIXML.py#L188</a></p>
<p>I'll create an issue on github.</p>
<hr />
<h1><a href="https://redmine.open-bio.org/issues/2176#change-15360">Bug #2176: XML Blast parser: miscellaneous bug fixes and cleanup</a></h1>
<ul><li>Author: Jacob Joseph</li>
<li>Status: New</li>
<li>Priority: Normal</li>
<li>Assignee: Vincent Davis</li>
<li>Category: Main Distribution</li>
<li>Target version: Not Applicable</li>
<li>URL: </li></ul>
<p>This follows the discussion started in bug 2051. The blast XML parser does now work (Thanks!), but could still use a little work. Here's a list of the issues I can see now. I'll follow with patches to correct a few.</p>
<p>In Record.py, HSP.identities, HSP.gaps, and HSP.positives are still<br />defined as (None,None) tuples. However, in NCBIXML.py, these<br />variables are set as integers. I don't see a point of a tuple at all,<br />at least for NCBIXML. (I realize it is used in NCBIStandalone.py).<br />Most importantly, the inconsistency makes it difficult to handle cases<br />when the parameter is not set. It seems easiest, though, to just<br />retain the tuple format.</p>
<p>In the past, I worried that the order of tuple building for<br />self._blast.gap_penalties or ka_params could cause the tuple to have<br />an incorrect ordering. I seem to remember hitting an issue where the<br />tuple was built with the wrong length, but I can't be specific. In<br />general, it remains odd to me to not just use a list and set each<br />element respectively. If necessary, one could convert to a tuple when<br />finished or use some other approach that does not rely upon order.</p>
<p>Why not use query_len, as defined in the XML file, or query_length<br />instead of query_letters as a variable name? In<br />BlastParser._end_Iteration, self._blast.query_letters is set. This is<br />not defined/documented in the Parameters class in Record.py. Rather,<br />query_length is defined there. In the Header class, though, the name<br />query_letters is used. There also seems to be some confusion between<br />num_letters_in_database, num_sequences_in_database, database_letters,<br />and database_sequences. Note that even if this naming is not<br />corrected, NCBIXML.py:186 is wrong with "self._blast_query_letters" <br />rather than "self._blast.query_letters".</p>
<p>Similarly, why store the bit score and E-value as 'bits' and<br />'_hsp.expect'/'descr.e' rather than just using bit_score and<br />evalue, as in the blast XML ouput?</p>
<p>I make use of <Hsp_align-len> in 2.2.13. This value missing<br />entirely.</p>
<p>The parsing of <Hit_id> and <Hit_def> is confusing. For example,<br /><Hit><br /> <Hit_num>1</Hit_num><br /> <Hit_id>gnl|BL_ORD_ID|0</Hit_id><br /> <Hit_def>3377250</Hit_def><br /> ...<br />results in _hit.title set to "gnl|BL_ORD_ID|0 3377250". I would<br />rather they remain separate (or both methods be used).</p>
<p>This is certainly not an exhaustive list. I'm happy to provide<br />another patch correcting many of these inconsistencies. At the<br />very least, the variable names defined in Record.py should be<br />used in NCBIXML.py. May I modify at least the above names to<br />correspond more closely to the names used in the XML? I know<br />I've found this particularly confusing.</p>
<p>-Jacob</p>
<fieldset class="attachments"><legend>Files</legend>
<a href="https://redmine.open-bio.org/attachments/download/533/NCBIXML.patch">NCBIXML.patch</a>
(2.6 KB)<br />
<a href="https://redmine.open-bio.org/attachments/download/534/Record.patch">Record.patch</a>
(3.37 KB)<br />
<a href="https://redmine.open-bio.org/attachments/download/1011/no_blast_tuples.patch">no_blast_tuples.patch</a>
(1.42 KB)<br />
</fieldset>
<hr />
<span class="footer"><p>You have received this notification because you have either subscribed to it, or are involved in it.<br />To change your notification preferences, please click here and login: <a class="external" href="http://redmine.open-bio.org">http://redmine.open-bio.org</a></p></span>
</body>
</html>