[Biopython] help with ncbiWWW

Damian Menning dmenning at mail.usf.edu
Wed Jul 26 20:50:50 UTC 2017


I ran in to a similar problem downloading multiple FASTA files from NCBI
where it would get 'hung up' on large sequences.  I found a function on
StackOverflow that worked well. It's super simple, effective, and should
work with your search with minor tweeking.  It's currently set to timeout
after 10 seconds.

  Damian

On Wed, Jul 26, 2017 at 6:24 AM, Peter Cock <p.j.a.cock at googlemail.com>
wrote:

> That does help, thank you.
>
> First of all that tells me you are using Windows and your Python is
> from Anaconda (probably not important here).
>
> Now, I had been guessing the code was getting stuck while actually
> connecting to the NCBI and waiting an update - which is where that
> socket timeout would come into play.
>
> I see now the problem is when Biopython checks for an update,
> waits for a bit, checks for an update, waits for a bit, ... and never
> gives up:
>
> https://github.com/biopython/biopython/blob/biopython-170/
> Bio/Blast/NCBIWWW.py#L164
>
> The code increases the wait interval to 120s (two minutes), but
> currently has no (optional) maximum total waiting time. Adding
> this as an option seems sensible (e.g. a maximum total waiting
> time of say 5 or 10 mins).
>
> Also, it would be good to check if the NCBI is returning some
> clue or error message which our code does not understand...
>
> From your initial description is sounds like you have not found
> any single example which fails - so this is going to be hard to
> test.
>
> Peter
>
> On Wed, Jul 26, 2017 at 3:04 PM, Pejvak Moghimi
> <pejvak.moghimi at york.ac.uk> wrote:
> > Hi Peter,
> >
> > Here it is:
> >
> > Traceback (most recent call last):
> >
> >   File "<ipython-input-107-561cd74d2097>", line 1, in <module>
> >     runfile('D:/Dropbox/Pejvak
> > Moghimi/DMT_project/blast_for_clav_seqs/blastScript(altered).py',
> > wdir='D:/Dropbox/Pejvak Moghimi/DMT_project/blast_for_clav_seqs')
> >
> >   File
> > "C:\Users\pezhv\Anaconda3\lib\site-packages\spyder\utils\
> site\sitecustomize.py",
> > line 880, in runfile
> >     execfile(filename, namespace)
> >
> >   File
> > "C:\Users\pezhv\Anaconda3\lib\site-packages\spyder\utils\
> site\sitecustomize.py",
> > line 102, in execfile
> >     exec(compile(f.read(), filename, 'exec'), namespace)
> >
> >   File "D:/Dropbox/Pejvak
> > Moghimi/DMT_project/blast_for_clav_seqs/blastScript(altered).py", line
> 116,
> > in <module>
> >     result_handle = NCBIWWW.qblast("blastp", "nr", sequence,
> > hitlist_size=500, entrez_query = orgn_specified)
> >
> >   File "C:\Users\pezhv\Anaconda3\lib\site-packages\Bio\Blast\
> NCBIWWW.py",
> > line 164, in qblast
> >     time.sleep(wait)
> >
> >
> > Cheers,
> > Pej.
> >
> >
> > On 26 July 2017 at 14:57, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> >>
> >> Hi Pej.
> >>
> >> Hmm. Maybe setting the timeout is not going to solve your
> >> problem. I was hoping that would be a neat solution.
> >>
> >> Can you show us the stack trace when you had to stop a job
> >> please?
> >>
> >> I assume you are using control+c to do this, in which case
> >> Python ought to stop with the exception KeyboardInterrupt.
> >> What I am interested in here is where in the code Python
> >> is getting stuck. That would be a good clue.
> >>
> >> Peter
> >>
> >> On Wed, Jul 26, 2017 at 2:47 PM, Pejvak Moghimi
> >> <pejvak.moghimi at york.ac.uk> wrote:
> >> > Hi Peter,
> >> >
> >> > That solution, so far, does not seem to have worked nor with 10
> neither
> >> > with
> >> > 30 second options.
> >> >
> >> > Cheers,
> >> > Pej.
> >> >
> >> > On 26 July 2017 at 13:29, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
> >> >>
> >> >> I am hoping that putting this near the start of your script will
> >> >> apply the default timeout to all your BLAST calls (or other
> >> >> network calls, e.g. NCBI Entrez):
> >> >>
> >> >> import socket
> >> >> socket.setdefaulttimeout(30)  # timeout in seconds
> >> >>
> >> >> Peter
> >
> >
> _______________________________________________
> Biopython mailing list  -  Biopython at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython
>



-- 
Damian Menning, Ph.D.

"There are two types of academics. Those who use the Oxford comma, those
who don't and those who should."

Standard comma - You know Bob, Sue and Greg? They came to my house.
Oxford comma - You know Bob, Sue, and Greg? They came to my house.
Walken Comma - You know, Bob, Sue, and Greg? They came, to my house.
Shatner comma - You, know, Bob, Sue, and Greg? They, came, to my house.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20170726/dc3a6056/attachment.html>
-------------- next part --------------
# Checks to make sure input file is in the folder
try:
    fname = raw_input("\nEnter file name with GB IDs you want pulled from NCBI: ")
    fhand = open(fname, 'r')
except:
    print "File does not exist in folder! Check file name and extension."
    quit()

#Timeout timer from http://stackoverflow.com/questions/21827874/timeout-a-python-function-in-windows
def timeout(timeout):
    def deco(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            res = [Exception('function [%s] timeout [%s seconds] exceeded!' % (func.__name__, timeout))]
            def newFunc():
                try:
                    res[0] = func(*args, **kwargs)
                except Exception, e:
                    res[0] = e
            t = Thread(target=newFunc)
            t.daemon = True
            try:
                t.start()
                t.join(timeout)
            except Exception, je:
                print 'error starting thread'
                raise je
            ret = res[0]
            if isinstance(ret, BaseException):
                raise ret
            return ret
        return wrapper
    return deco

# NCBI nucleotide database search function
def NCBI_fetch(sequence):
    fetch_handle=Entrez.efetch(db="nucleotide", rettype="fasta", retmode="text", id=line)
    data=fetch_handle.read()
    fetch_handle.close()
    totalhand.write(data)
    
# Parse each GB ID and returns either a .fasta file
while True:
    for line in fhand:
        line=line.rstrip()
        func = timeout(timeout=10)(NCBI_fetch)
        try:
            func(line)
        except:
            print "\nTimeout downloading %s" % (line)
            continue
        else:
            break


More information about the Biopython mailing list