<div dir="ltr"><div>Thank you for wanting to help, Don. I guess I'll throw in my opinion, although that's just a use case rather than a developer position.<br><br>I was using (old) Bio.pairwise2 and eventually found that it's easier and quicker to just subprocess EMBOSS distmat, even though I had to write my own parser for their format. That generally covers my experience with (Bio)python: it's absolutely great for converting sequences, interacting with APIs, wrapping pipelines, doing lighter analyses and maybe drawing the results. The simple things like "Rename all the sequences in this fasta according to the data in that SQL DB" can even be done in interactive shell, which is undeniably cool. However, I rarely if ever do costly computation in Biopython, instead calling the tools someone else has written in C or whatever.<br><br></div>The point: performance of Biopython itself is very rarely a bottleneck. If you manage to make alignment (pairwise and multiple), statistical analysis of trees (consensus networks, supertrees, consensus trees and such), distance calculations and maybe even some search for stuff in sequences (like HMMs or intron prediction) run so quick I don't have to bother installing stuff and writing wrappers, that'll be great. Although the good way to do that is to write a lot of wrappers and bloat distribution with all the tools, I'm afraid.<br></div><div class="gmail_extra"><br><div class="gmail_quote">2016-09-23 23:03 GMT+08:00 Gunning, Don <span dir="ltr"><<a href="mailto:don.gunning@intel.com" target="_blank">don.gunning@intel.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Peter<br>
<br>
Thanks for the reply.<br>
<br>
Regarding disk i/o, this is a universal issue. The main solution available is code restructuring and the use of SSD's to hide latency.<br>
<br>
Regarding Pairwise2, we are looking for packages that we can include in our distribution. Can anyone advise how widely it is used and Is there an need for further performance improvement?<br>
<br>
Regards<br>
<span class="HOEnZb"><font color="#888888"><br>
Don<br>
</font></span><div class="HOEnZb"><div class="h5"><br>
-----Original Message-----<br>
From: Peter Cock [mailto:<a href="mailto:p.j.a.cock@googlemail.com">p.j.a.cock@googlemail.<wbr>com</a>]<br>
Sent: Thursday, September 01, 2016 5:31 AM<br>
To: Gunning, Don <<a href="mailto:don.gunning@intel.com">don.gunning@intel.com</a>><br>
Cc: <a href="mailto:biopython@biopython.org">biopython@biopython.org</a>; Biopython-Dev Mailing List <<a href="mailto:biopython-dev@mailman.open-bio.org">biopython-dev@mailman.open-<wbr>bio.org</a>><br>
Subject: Re: [Biopython] Intel Python distribution<br>
<br>
Hi Don,<br>
<br>
Biopython covers so many topics that everyone using it probably<br>
has a different bottleneck. I tend to do basic sequence manipulations<br>
where disk IO is the main bottleneck - although often problems<br>
in this area are more on the end user script (e.g. taking advantage<br>
of Python sets rather than lists for membership checking).<br>
<br>
I do know that the old Bio.pairwise2 code was performing poorly<br>
on larger sequences (this has a C backend), but that has been<br>
improved with a rewrite in our latest release, Biopython 1.68.<br>
<br>
Hopefully some of our community will volunteer to talk about<br>
where they think Biopython needs some optimisation?<br>
<br>
Regards,<br>
<br>
Peter<br>
<br>
On Wed, Aug 31, 2016 at 2:34 PM, Gunning, Don <<a href="mailto:don.gunning@intel.com">don.gunning@intel.com</a>> wrote:<br>
><br>
><br>
> Intel has just announced the Intel Python distribution. An open source<br>
> version with many packages optimized for performance<br>
><br>
><br>
><br>
> <a href="https://software.intel.com/en-us/python-distribution" rel="noreferrer" target="_blank">https://software.intel.com/en-<wbr>us/python-distribution</a><br>
><br>
><br>
><br>
> The life sciences market is an area we are trying to help with. And your<br>
> project seemed interesting as it is aligned with our thinking.<br>
><br>
><br>
><br>
> Could someone write me back and discuss how we could contribute to your<br>
> project and get our distribution more widely used in your community? One<br>
> thought is for Intel to optimize and include packages that are regularly<br>
> used in your community.<br>
><br>
><br>
><br>
> We look forward to hearing from you and potentially collaborating.<br>
><br>
><br>
><br>
> Regards<br>
><br>
><br>
><br>
> Don<br>
><br>
> Don Gunning<br>
><br>
> Software Program Manager<br>
><br>
> Technical computing group<br>
><br>
> Developer Product Division<br>
><br>
> Intel Corporation<br>
><br>
> 1906 Fox Dr<br>
><br>
> Champaign Il 61820<br>
><br>
> 217 403 4213<br>
><br>
><br>
><br>
><br>
> ______________________________<wbr>_________________<br>
> Biopython mailing list - <a href="mailto:Biopython@mailman.open-bio.org">Biopython@mailman.open-bio.org</a><br>
> <a href="http://mailman.open-bio.org/mailman/listinfo/biopython" rel="noreferrer" target="_blank">http://mailman.open-bio.org/<wbr>mailman/listinfo/biopython</a><br>
<br>
______________________________<wbr>_________________<br>
Biopython mailing list - <a href="mailto:Biopython@mailman.open-bio.org">Biopython@mailman.open-bio.org</a><br>
<a href="http://mailman.open-bio.org/mailman/listinfo/biopython" rel="noreferrer" target="_blank">http://mailman.open-bio.org/<wbr>mailman/listinfo/biopython</a><br>
</div></div></blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><font face="arial,helvetica,sans-serif">Alexey Morozov,<br>LIN SB RAS, bioinformatics group.<br>Irkutsk, Russia.<br></font></div>
</div>