[Biopython-dev] NCBIWWW.qblast: Question about expected run time and time outs

Lev Tsypin ltsypin at uchicago.edu
Wed Jun 24 04:08:49 UTC 2015


Hi Peter,

It seems that I may have been black-listed, unfortunately. I didn't think
it was the case because I've been traveling (and so my IP address has been
changing), but when I tried running my program with a VPN, it worked fine
again. I'll be more careful.

I am getting an error with the test_NCBI_qblast.py, though:

FAIL: test_orchid_est (__main__.TestQblast)
------------------------------------------------------------
Traceback (most recent call last):
  File "test_NCBI_qblast.py', line 69, in test_orchid_est
    0.0000001, None, ["21554275", "18409071", "296087288"])
  File "test_NCBI_qblast.py', line 124, in run_qblast
    % ", ".join(expected_hits)
AssertionError: Missing all of 21554275, 18409071, 296087288 in alignments
------------------------------------------------------------

Thank you for your help!

--Lev

On Mon, Jun 22, 2015 at 10:07 AM, Peter Cock <p.j.a.cock at googlemail.com>
wrote:

> Hi Lev,
>
> Can you run any small tests with Biopython? e.g. test_NCBI_qblast.py
> in the Tests folder of the Biopython source code.
>
> If they also fail, it would be interesting to see the exception message(s).
> That might give some clues (e.g. a local network problem).
>
> My guess is something changed at the NCBI - perhaps they are now
> stricter about long running jobs, or if you are very unlucky it might
> be that your IP address was (temporarily) black listed for too much
> usage?
>
> Peter
>
> On Mon, Jun 22, 2015 at 3:49 PM, Lev Tsypin <ltsypin at uchicago.edu> wrote:
> > Hi Peter,
> >
> > Unfortunately, I think that might not be an option for me. The software
> I'm
> > trying to write is meant to be an open-source tool that researchers would
> > just be able to use without extensive set up required. I'm afraid that I
> > won't be able to ask people to install standalone BLAST and learn to use
> > computer clusters, without losing the accessibility of the program.
> >
> > Do you think that this is really the BLAST server being busy? As I said,
> I
> > didn't have any problems for a long time. The average time to get a BLAST
> > result back would be about 6-7 minutes. Now I just don't get through.
> >
> > Best,
> > Lev
> >
> > On Mon, Jun 22, 2015 at 4:03 AM, Peter Cock <p.j.a.cock at googlemail.com>
> > wrote:
> >>
> >> Hi Lev,
> >>
> >> My usual advice when dealing with any large-scale BLAST
> >> search is to download the NCBI database and use standalone
> >> BLAST+ locally, rather than the NCBI web-service which
> >> can be busy - especially during USA working hours.
> >>
> >> Do you have access to a local Linux cluster or similar? It is
> >> very likely there are people in your department/university
> >> already doing this - often the SysAdmin will keep a single
> >> shared copy of the databases up to date for everyone to
> >> use.
> >>
> >> (You would likely need to do some post-filtering to remove
> >> any Ciliata hits since the Entrez query option is only available
> >> when running BLAST at the NCBI.)
> >>
> >> Peter
> >>
> >> On Sun, Jun 21, 2015 at 7:19 PM, Lev Tsypin <ltsypin at uchicago.edu>
> wrote:
> >> > Hello everyone,
> >> >
> >> > I have been writing a tool that makes use of Biopython for automatic
> >> > BLAST
> >> > searches--your libraries have made my life so much easier! I really
> >> > appreciate your work. I've recently begun to run into some trouble,
> >> > though,
> >> > and I am not quite sure how to explain it, or respond to it, so I
> wanted
> >> > to
> >> > ask for advice:
> >> >
> >> > The issue is that, of late, when I call the NCBIWWW.qblast function,
> it
> >> > takes forever--literally never finishing. Before, there were sometimes
> >> > cases
> >> > that it would get stuck for a long time (up to an hour or so), but it
> >> > would
> >> > then manage to fight through whatever obstacle and go on. In such
> cases,
> >> > I
> >> > also found that if I were to artificially restart the request, the
> >> > function
> >> > would rouse itself and go much better. Here's an example of a function
> >> > call:
> >> >
> >> > blastp_result = NCBIWWW.qblast('blastp', 'nr',
> >> >
> >> >
> 'MSLSREENIYMGKISEQTERFEDMLEYMKKVVQTGQELSVEERNLLSVAYKNTVGSRRSAWRSISAIQQKEESKGSKHLDLLTNYKKKIETELNLYCEDILRLLNDYLIKNATNAEAQVFFLKMKGDYYRYIAEYAQGDDHKKAADGALDSYNKASEIANSELSTTHPIRLGLALNFSVFHYEVLNDPSKACTLAKTAFDEAIGDIERIQEDQYKDATTIMQLIRDNLTLWTSEFQDDAEEQQE',
> >> > entrez_query = 'NOT Ciliata').read()
> >> >
> >> > [In the protein sequence above I have multiple lines so that it fits
> in
> >> > the
> >> > email, but when I normally run the function I don't have any newline
> >> > characters or anything, of course]
> >> >
> >> > My questions are the following: Why does the function sometimes get
> >> > stuck
> >> > for so long, and what should I do now that it never seems to work
> >> > anymore?
> >> > Do you have any suggestions for introducing a 'time out' so that if,
> for
> >> > example, the request takes longer than 10 minutes, it would
> >> > automatically
> >> > retry? I know there is an optional parameter in the urllib2 library
> for
> >> > a
> >> > time out, but, looking at the source code for NCBIWWW.qblast(), it
> >> > wasn't
> >> > obvious to me whether and how it would work to use it.
> >> >
> >> > Thank you very much for any advice.
> >> >
> >> > Best regards,
> >> > Lev
> >> >
> >> > _______________________________________________
> >> > Biopython-dev mailing list
> >> > Biopython-dev at mailman.open-bio.org
> >> > http://mailman.open-bio.org/mailman/listinfo/biopython-dev
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython-dev/attachments/20150623/b7c4b585/attachment-0001.html>


More information about the Biopython-dev mailing list