<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    Just one more comment regarding alternatives to blast. Recently I've
    come across such an alternative that is not as sensitive as blast
    but a lot faster, it's called lambda:<br>
    <br>
    <a class="moz-txt-link-freetext" href="http://www.seqan.de/projects/lambda/">http://www.seqan.de/projects/lambda/</a><br>
    <br>
    I've tried it out and I'm very impressed with the results, it can do
    full UniRef100 searches in a split of a second. There are still some
    issues to iron out, especially in the indexing which is very memory
    and disk hungry. But all in all it does seem to be a real
    alternative to blast.<br>
    <br>
    Their output is blast compatible: they can do either classic
    pairwise output (-m 0) or tabular output (-m 8). No XML output yet
    though.<br>
    <br>
    So this would support the case to have some kind of framework that
    can deal with the results of a sequence homology search. The actual
    parsers would be then implemented on a per-case basis. <br>
    <br>
    Jose<br>
    <br>
    <br>
    <br>
    <div class="moz-cite-prefix">On 10.05.2015 14:04, Paolo Pavan wrote:<br>
    </div>
    <blockquote
cite="mid:CAD2LbkxYm_0OChiYY8ADyZoM-LWtk8RoNWn2PEVHQ70hf-KW_w@mail.gmail.com"
      type="cite">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <div dir="ltr">
        <div>
          <div>
            <div>
              <div>Hello!<br>
              </div>
              I obviously share the opinion of Peter and Jose. Moreover,
              as already written, I have used this new feature in a
              second work that I could also describe and submit to
              biojava, if of any interest.<br>
              <br>
              About Andreas' questions:<br>
            </div>
            " Does your module support psiblast, rpsblast, tblastx and
            blast+ and what versions?": At now, it supports the blastn,
            blastp, blastx, tblastn and tblastx version 2.2.29. I'm not
            very sure about psiblast and rpsblast, I should test it. <br>
            But it has been designed so that to update a single parser
            (as well to add a new search program and still remaining in
            the designed framework) there will be the need to write just
            a single class. This will keep the code simple and neat,
            very important in my opinion for future developers.<br>
            <br>
            "the disadvantage is that you constantly need to update them
            to the variant of blast plus version of the output file
            format": this unfortunately is a problem that everyone of us
            have to face if wants to use new ncbi programs. It happened
            for legacy-blast, it happened a lot of time for genbank
            format, it is happening for blast+. Just hoping that they
            would have the kindness explicit the format version inside
            the xml if not to name the program itself in different way,
            such for example blast3 or blast++, to avoid confusion. We
            can't do anything about that, we can just try to make the
            things simple and easy to reuse.<br>
            <br>
          </div>
          Just to express my opinion, I think that every bio project
          should first of all address theese "base level" problem more
          than others to allow the developer to focus on higher
          abstraction details. I'm sure that this will be appreciated by
          the community, increasing the base of users of biojava.<br>
          <br>
        </div>
        Paolo<br>
        <div>
          <div>
            <div class="gmail_extra"><br>
              <div class="gmail_quote">2015-05-06 12:15 GMT+02:00 Jose
                Manuel Duarte <span dir="ltr">&lt;<a
                    moz-do-not-send="true"
                    href="mailto:jose.duarte@psi.ch" target="_blank">jose.duarte@psi.ch</a>&gt;</span>:<br>
                <blockquote class="gmail_quote" style="margin:0px 0px
                  0px 0.8ex;border-left:1px solid
                  rgb(204,204,204);padding-left:1ex">I'd say that having
                  some common data structure to model the output of a
                  sequence homology search should be benefitial. For
                  instance a blast alternative might appear one day (I'm
                  eagerly awaiting for it!). The common data structure
                  should be able to model the outputs of any of the
                  different softwares.<br>
                  <br>
                  There are already some alternatives to blast:<br>
                  <br>
                  SANS and SANSparallel by Liisa Holm (<a
                    moz-do-not-send="true"
                    href="http://www.ncbi.nlm.nih.gov/pubmed/22962464"
                    target="_blank">http://www.ncbi.nlm.nih.gov/pubmed/22962464</a>,
                  <a moz-do-not-send="true"
href="http://nar.oxfordjournals.org/content/early/2015/04/08/nar.gkv317.full"
                    target="_blank">http://nar.oxfordjournals.org/content/early/2015/04/08/nar.gkv317.full</a>)<br>
                  USEARCH (commercial) (<a moz-do-not-send="true"
                    href="http://drive5.com/usearch/" target="_blank">http://drive5.com/usearch/</a>)<br>
                  BLAT (<a moz-do-not-send="true"
                    href="https://genome.ucsc.edu/FAQ/FAQblat.html#blat3"
                    target="_blank">https://genome.ucsc.edu/FAQ/FAQblat.html#blat3</a>)<br>
                  <br>
                  In fact SANSparallel looks very promising, it's
                  incredibly fast though less sensitive than blast.<br>
                  <br>
                  Cheers<span class=""><font color="#888888"><br>
                      <br>
                      Jose</font></span>
                  <div>
                    <div class="h5"><br>
                      <br>
                      <br>
                      <br>
                      On <a moz-do-not-send="true"
                        href="tel:06.05.2015%2010" value="+390605201510"
                        target="_blank">06.05.2015 10</a>:47, Peter Cock
                      wrote:<br>
                    </div>
                  </div>
                  <blockquote class="gmail_quote" style="margin:0px 0px
                    0px 0.8ex;border-left:1px solid
                    rgb(204,204,204);padding-left:1ex">
                    <div>
                      <div class="h5">
                        On Wed, May 6, 2015 at 6:02 AM, Andreas Prlic
                        &lt;<a moz-do-not-send="true"
                          href="mailto:andreas@sdsc.edu" target="_blank">andreas@sdsc.edu</a>&gt;
                        wrote:<br>
                        <blockquote class="gmail_quote"
                          style="margin:0px 0px 0px
                          0.8ex;border-left:1px solid
                          rgb(204,204,204);padding-left:1ex">
                          On Tue, May 5, 2015 at 1:18 PM, Paolo Pavan
                          &lt;<a moz-do-not-send="true"
                            href="mailto:paolo.pavan@gmail.com"
                            target="_blank">paolo.pavan@gmail.com</a>&gt;
                          wrote:<br>
                          <blockquote class="gmail_quote"
                            style="margin:0px 0px 0px
                            0.8ex;border-left:1px solid
                            rgb(204,204,204);padding-left:1ex">
                            As seen in other Bio projects, aside with
                            Sequence IO and Alignment IO<br>
                            procedures it could have a Search result IO
                            also.<br>
                          </blockquote>
                          I never understood why other Bio* projects
                          have special Blast modules.<br>
                          Perhaps XML parsing is not as easy as it is in
                          Java? Please see the code at<br>
                          the bottom of this message.<br>
                        </blockquote>
                        Python at least has a range of XML parsing
                        libraries which are up to the<br>
                        task. However, as Paolo wrote:<br>
                        <br>
                        <blockquote class="gmail_quote"
                          style="margin:0px 0px 0px
                          0.8ex;border-left:1px solid
                          rgb(204,204,204);padding-left:1ex">
                          <blockquote class="gmail_quote"
                            style="margin:0px 0px 0px
                            0.8ex;border-left:1px solid
                            rgb(204,204,204);padding-left:1ex">
                            The advantage is to define common data
                            structures that models Hsp, Hits,<br>
                            Results without taking care (ie. making
                            abstraction) of the underlying<br>
                            search program.<br>
                          </blockquote>
                        </blockquote>
                        This is the big advantage of BioPerl and
                        Biopython's SearchIO module.<br>
                        You can at least in theory switch between
                        parsing BLAST XML, BLAST<br>
                        tabular, BLAST plain text (shudder), or another
                        related format without<br>
                        major changes to your code.<br>
                        <br>
                        <blockquote class="gmail_quote"
                          style="margin:0px 0px 0px
                          0.8ex;border-left:1px solid
                          rgb(204,204,204);padding-left:1ex">
                          and the disadvantage is that you constantly
                          need to update them to the<br>
                          variant of blast plus version of the output
                          file format.<br>
                        </blockquote>
                        I think it is much better to have this
                        housekeeping done once centrally in<br>
                        a Bio* library that re-invented by anyone
                        parsing the BLAST output.<br>
                        However, the NCBI BLAST XML output has been
                        fairly stable, and the<br>
                        new output has a formal schema so should be even
                        more dependable.<br>
                        <br>
                        Peter<br>
                      </div>
                    </div>
                    <span class="">
                      _______________________________________________<br>
                      biojava-dev mailing list<br>
                      <a moz-do-not-send="true"
                        href="mailto:biojava-dev@mailman.open-bio.org"
                        target="_blank">biojava-dev@mailman.open-bio.org</a><br>
                      <a moz-do-not-send="true"
                        href="http://mailman.open-bio.org/mailman/listinfo/biojava-dev"
                        target="_blank">http://mailman.open-bio.org/mailman/listinfo/biojava-dev</a><br>
                    </span></blockquote>
                  <div class="">
                    <div class="h5">
                      <br>
                      _______________________________________________<br>
                      biojava-dev mailing list<br>
                      <a moz-do-not-send="true"
                        href="mailto:biojava-dev@mailman.open-bio.org"
                        target="_blank">biojava-dev@mailman.open-bio.org</a><br>
                      <a moz-do-not-send="true"
                        href="http://mailman.open-bio.org/mailman/listinfo/biojava-dev"
                        target="_blank">http://mailman.open-bio.org/mailman/listinfo/biojava-dev</a><br>
                    </div>
                  </div>
                </blockquote>
              </div>
              <br>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
  </body>
</html>