<html>
<head>
<style>
body {
  font-family: Verdana, sans-serif;
  font-size: 0.8em;
  color:#484848;
}
h1, h2, h3 { font-family: "Trebuchet MS", Verdana, sans-serif; margin: 0px; }
h1 { font-size: 1.2em; }
h2, h3 { font-size: 1.1em; }
a, a:link, a:visited { color: #2A5685;}
a:hover, a:active { color: #c61a1a; }
a.wiki-anchor { display: none; }
fieldset.attachments {border-width: 1px 0 0 0;}
hr {
  width: 100%;
  height: 1px;
  background: #ccc;
  border: 0;
}
span.footer {
  font-size: 0.8em;
  font-style: italic;
}
</style>
</head>
<body>
Issue #2601 has been updated by Vincent Davis.

<ul>
  <li><strong>Description</strong> updated (<a title="View differences" href="https://redmine.open-bio.org/journals/diff/15352?detail_id=1630">diff</a>)</li>
  <li><strong>Status</strong> changed from <i>New</i> to <i>Closed</i></li>
  <li><strong>Assignee</strong> changed from <i>Biopython Dev Mailing List</i> to <i>Vincent Davis</i></li>
  <li><strong>% Done</strong> changed from <i>0</i> to <i>100</i></li>
</ul>

<p>A find method has been implemented over 8yrs ago, no additional feedback has been submitted. closing issue</p>
<hr />
<h1><a href="https://redmine.open-bio.org/issues/2601#change-15352">Bug #2601: Seq find() method: proposal</a></h1>

<ul><li>Author: Leighton Pritchard</li>
<li>Status: Closed</li>
<li>Priority: Normal</li>
<li>Assignee: Vincent Davis</li>
<li>Category: Main Distribution</li>
<li>Target version: Not Applicable</li>
<li>URL: </li></ul>

<p>A find() method for the Seq object was recently proposed on the mailing list.  I have extended Seq locally to include a find method that uses the re module and the reverse_complement function from Bio.Seq, and is described below.  In the original implementation, the search was meant to be called from the parent SeqRecord object, which populated itself with features describing the search results.</p>


        <p>I'm proposing this as a potential starting point for the implementation of a Seq.find() method.</p>


        <p>Note that the loop of re.search() calls was necessary to obtain the set of overlapping matches, as re.finditer() only returns non-overlapping matches.  The two functions searching in forward-only and reverse-only directions could probably be combined, and behaviour distinguished on keyword, for neater code.</p>


        <p>####

        <p>def find_regexes(self, pattern):<br />        """ find_regexes(self, pattern)</p>
pattern           String, regular expression to search for
        <ol>
        <li>Find forward matches<br />        match_locations = [(hit.start()+1, hit.end(), \<br />                            self.data[hit.start():hit.end()], 1) \<br />                           for hit in self.__find_overlapping_regexes(pattern)]</li>
                <li>If the sequence is a nucleotide sequence, look on the reverse</li>
                <li>strand, too<br />        if self.alphabet.__class__ in [Alphabet.DNAAlphabet,<br />                                       Alphabet.RNAAlphabet,<br />                                       IUPAC.ExtendedIUPACDNA,<br />                                       IUPAC.IUPACAmbiguousDNA,<br />                                       IUPAC.IUPACUnambiguousDNA,<br />                                       IUPAC.IUPACAmbiguousRNA,<br />                                       IUPAC.IUPACUnambiguousRNA]:<br />            rev_locations = [(hit.start()+1, hit.end(), \<br />                              self.data[hit.start():hit.end()], 1) \<br />                             for hit in \<br />                             self.__find_overlapping_regexes_rev(pattern)]<br />            match_locations += rev_locations<br />        match_locations.sort()<br />        return match_locations</li>
        </ol>


        <pre><code>Finds all occurrences of the passed regular expression in the<br />            sequence, and returns a list of tuples in the format:<br />            (start, end, match, strand).</code></pre>


        <pre><code>If the sequence is a nucleotide sequence, the reverse strand is<br />            also searched<br />        """</code></pre>


        <p>def __find_overlapping_regexes(self, pattern):<br />        """ Finds all overlapping regexes matching the passed pattern in the<br />            sequence, and returns a list of re.SRE_Match objects describing<br />            them.<br />        """ <br />        hits = []<br />        pos = 0<br />        regex = re.compile(pattern)<br />        while pos < len(self.data):<br />            hit = regex.search(self.data, pos=pos)<br />            if hit is None:<br />                break<br />            hits.append(hit)<br />            pos = hit.start()+1<br />        return hits</p>


        <p>def __find_overlapping_regexes_rev(self, pattern):<br />        """ Finds all overlapping regexes matching the passed pattern in the<br />            sequence, and returns a list of re.SRE_Match objects describing<br />            them, as hits positioned in the forward direction - i.e. start and<br />            end read in the forward sense.<br />        """ <br />        hits = []<br />        pos = 0<br />        regex = re.compile(reverse_complement(Seq(pattern, self.alphabet)))<br />        while pos < len(self.data):<br />            hit = regex.search(self.data, pos=pos)<br />            if hit is None:<br />                break<br />            hits.append(hit)<br />            pos = hit.start()+1<br />        return hits</p>
</p>



<hr />
<span class="footer"><p>You have received this notification because you have either subscribed to it, or are involved in it.<br />To change your notification preferences, please click here and login: <a class="external" href="http://redmine.open-bio.org">http://redmine.open-bio.org</a></p></span>
</body>
</html>