[Biopython-dev] GSoC SearchIO project

Wibowo Arindrarto w.arindrarto at gmail.com
Mon Sep 3 10:14:59 UTC 2012


Hello everyone,

I'd like to update everyone on my latest SearchIO(?) developments. There
has been some progress and bug fixes since GSoC officially ended two weeks
ago. Some of them I'd like to share here:

1. I've written a draft tutorial chapter for the submodule. It' been pushed
to my development repo (https://github.com/bow/biopython/tree/searchio) and
I'm hosting the HTML temporarily on my site (
http://bow.web.id/biopython/Tutorial.html). Comments and critiques are
welcomed :).

2. Back on the naming issue, I'm still using SearchIO for now. I've
experimented with other names (Bio.Search and Bio.SeqSearch), and my
impression is I like Bio.SeqSearch the most, followed by Bio.Search, and
Bio.SearchIO. It does feel confusing initially (we have SeqUtils,
SeqFeature, etc.), but after a while it's the one that feels most natural.

3. And finally, Peter and I discussed this briefly previously: what about
if we merge the existing BLAST wrappers and NCBI qblast into Bio.(SeqSearch
/ Search / SearchIO)? I felt there were a lot of overlap between this
submodule and Bio.BLAST when writing the tutorial, so merging surfaced in
my thoughts again. We could put the BLAST wrappers under
Bio.SeqSearch.Applications (for example), along with other wrappers (I have
a yet-untested Bio.HMMER3 wrapper and possibly Bio.BLAT wrapper that put
here as well). As for qblast (and other remote searches, like the one
provided by HMMER at the moment), we could put them in
Bio.SeqSearch.Remote, perhaps. I think this would make it easier for anyone
who works with BLAST / other sequence search tools as all Biopython-related
functionalities are grouped in one place.

This is just a thought for now, but I'd love to hear your thoughts on the
merge (and the naming ;) ).

cheers,
Bow


On Tue, Aug 21, 2012 at 6:01 PM, Wibowo Arindrarto
<w.arindrarto at gmail.com>wrote:

> On Tue, Aug 14, 2012 at 9:49 PM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
> > On Tue, Apr 10, 2012 at 1:58 AM, Brad Chapman wrote:
> >> Michiel;
> >>> Hi Eric, Peter,
> >>>
> >>> > How about Bio.Search, for now?
> >>>
> >>> I would prefer Bio.Pairwise or Bio.Align.Pairwise, since that tells
> >>> users something about what the module is for. Bio.Search could be
> >>> anything (search PubMed? search the Entrez databases? search Google?
> >>> anyway Bio.Search does not suggest that this module is about pairwise
> >>> alignments). But Peter previously mentioned that he doesn't like
> >>> Bio.Pairwise; can we convince you?
> >>
> >> I agree with Peter on this one. The module is primarily about searching
> >> a sequence database with an input via multiple methods, not about
> >> pairwise alignment of two sequences with is what Bio.Align.Pairwise
> >> suggests to me.
> >>
> >> Brad
> >
> > On potential problem with Bio.Search (on top of concerns raised
> > here about vagueness) Bow and I were just talking about during
> > our weekly GSoC video call was the existence of Bio/Search.py
> > which is obsolete and long overdue for removal. I have just
> > deprecated it (something I forgot to do before the last release):
> >
> https://github.com/biopython/biopython/commit/5a275ccd1df3def40df1eef517af755d373dadd8
> >
> > We'd earlier talked about using Bio.Search as the namespace. I was
> > worried about the potential existence on a user's machine of both
> > Bio/Search.py (the old obsolete code) and Bio/Search/__init__.py
> > (aka SearchIO, the new module) and which would take precedence
> > when doing: from Bio import Search
> >
> > Given how Python module installations work, that seems highly
> > likely to occur. The good news is that the package would take
> > priority - see http://www.python.org/doc/essays/packages.html
> >
> >>>>> What If I Have a Module and a Package With The Same Name?
> >>>>>
> >>>>> You may have a directory (on sys.path) which has both a module
> >>>>> spam.py and a subdirectory spam that contains an __init__.py
> >>>>> (without the __init__.py, a directory is not recognized as a
> package).
> >>>>> In this case, the subdirectory has precedence, and importing spam
> >>>>> will ignore the spam.py file, loading the package spam instead. If
> >>>>> you want the module spam.py to have precedence, it must be
> >>>>> placed in a directory that comes earlier in sys.path.
> >
> > So there is no technical reason to avoid Bio.Search as an
> > option for the Bio.SearchIO namespace. We could then
> > have Bio.Search.Applications for command line wrappers,
> > consistent with Bio.Phylo.Applications, Bio.Motif.Applications
> > and Bio.Align.Applications.
> >
> > Of course, Bio.Search is still perhaps too broad a name... but
> > on balance perhaps it is still better than Bio.SearchIO?
> >
> > Regards,
> >
> > Peter
>
> Hi everyone,
>
> If I may add my two cents, for now I am in favor of putting the module
> under Bio.Search. It is not the best name out there (it does sound a
> bit vague), but it's the one that seem to be the most intuitive (until
> a better alternative comes out). There were some other alternatives
> that I and Peter have discussed, but they seem less appealing for us.
> You're free to add your thoughts on these of course :) :
>
> - Bio.SeqSearch. This sounds ok, but when you consider we have
> Bio.Seq, Bio.SeqRecord, Bio.SeqFeature, and Bio.SeqUtils, it becomes
> quite confusing quickly.
>
> - Bio.PSearch ('p' for pairwise). This one seemed the less intuitive
> among the three options, so I'm not so big on this.
>
> For now, I'm still writing everything (code, docstrings, tutorial)
> using SearchIO. I suppose it's better if we could agree on a more
> suitable name, though.
>
> On another note, I'm also in favor of using the Bio.Phylo module
> skeleton for Bio.SearchIO / Bio.Search. We may then group all sequence
> search-related application wrappers under Applications (I actually
> prefers 'app' for better PEP8 compliance, but that's another
> discussion) and perhaps even refactor our remote search calls (e.g.
> the 'qblast' module) under Bio.Search as well.
>
> cheers,
> Bow
>



More information about the Biopython-dev mailing list