Re-attempt to open url with urllib in python on timeout

517
January 15, 2017, at 5:35 PM

I am looking to parse data from a large number of webpages using Python (>10k) and I am finding that the function I have written to do this often encounters a timeout error every 500 loops. I have attempted to fix this with a try - except code block, but i would like to improve the function so it will re-attempt to open the url four or five times before returning the error. Is there an elegant way to do this?

My code below:

def url_open(url):
    from urllib.request import Request, urlopen
    req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
    try:
        s = urlopen(req,timeout=50).read()
    except urllib.request.HTTPError as e:
        if e.code == 404:
            print(str(e))
        else:
            print(str(e))
            s=urlopen(req,timeout=50).read()
            raise
    return BeautifulSoup(s, "lxml")
Answer 1

I've used a pattern like this for retrying in the past:

def url_open(url):
    from urllib.request import Request, urlopen
    req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
    retrycount = 0
    s = None
    while s is None:
        try:
            s = urlopen(req,timeout=50).read()
        except urllib.request.HTTPError as e:
            print(str(e))
            if canRetry(e.code):
                retrycount+=1
                if retrycount > 5:
                    raise
                # thread.sleep for a bit
            else:
                raise 
    return BeautifulSoup(s, "lxml")

You just have to define canRetry somewhere else.

READ ALSO
embedding Bokeh server as a library: standalone vs. Tornado and Object Oriented Design

embedding Bokeh server as a library: standalone vs. Tornado and Object Oriented Design

I refer to the Bokeh's Documentation which describes how to embedding Bokeh server as a libraryThere are several example python there, specifically one for "standalone" and one for "Tornado", yet both use Tornado

460
Cyrillic chars in Python 2.7

Cyrillic chars in Python 2.7

In my script I pointed 1251 codepageBut Python 2

960
How implement a neural network that is written in python , on Java? [on hold]

How implement a neural network that is written in python , on Java? [on hold]

I created a python script to run a neural network in order to predict user's best choiceI want to run this script with the neural network on Java

347