when would python stuck at time.sleep function?

问题

Currently, I'm using selenium in python to do something which needs a never-end-up loop to monitor what I want, here's the code snippet:

records = set()
fileHandle = open('d:/seizeFloorRec.txt', 'a')
fileHandle.write('\ncur time: '+time.strftime('%Y-%m-%d %H:%M:%S',time.localtime(time.time()))+'\n')
driver = webdriver.Chrome()
while(True):
    try:
        print "time: ", time.strftime('%Y-%m-%d %H:%M:%S',time.localtime(time.time()))
        subUrls = aMethod(driver) # a irrelevant function which returns a list
        time.sleep(2)
        for i in range(0, len(subUrls)): 
            print "cur_idx=["+str(i)+"], max_cnt=["+str(len(subUrls))+"]"
            try:
                rtn = monitorFloorKeyword(subUrls[i])
                time.sleep(1.5)
                if(rtn[0] == True):
                    if(rtn[1] not in records):
                        print "hit!"
                        records.add(rtn[1])
                        fileHandle.write(rtn[1]+'\t'+rtn[2].encode('utf-8')+'\n')
                        fileHandle.flush()
                    else:
                        print "hit but not write."
            except Exception as e:
                print "exception when get page: ", subUrls[i]
                print e.__doc__
                continue

        print "sleep 5*60 sec..."
        time.sleep(300)  # PROBLEM LIES HERE!!!
        print "sleep completes."

    except Exception as e:
        print 'exception!'
        print e.__doc__
        time.sleep(20)

it always stucks unpredictably at time.sleep(300), with the output "sleep 5*60 sec..." yet without "sleep completes.".

Could anyone just give me some probable cause to this phenomenon? Thanks a lot!

UPDATED

I've found a similar problem here, but I don't actually get the point which he wanna say. Hope it will contribute to my problem.

LATEST TEST

Since using chromedriver, I added driver.get("about:blank") right before every return line in each function as below so as to force stopping async page-load of current page. and this force-stop-operation causes ERROR ipc_channel_win.cc(370)] pipe error: 109 sometimes which does NOT affect the running of my program. Is this what affects my time.sleep function?

def retrieveCurHomePageAllSubjectUrls(driver):
    uri = "http://www.example.com/main.php?page=1"
    driver.get(uri)
    element = driver.find_elements_by_class_name('subject')
    subUrls = []
    for i in range(0, len(element)):
    subUrls.append(element[i].get_attribute('href').encode('utf-8'))
    driver.get("about:blank") #This is what I add
    return subUrls

def monitorFloorKeyword(subUrl):
    driver.get(subUrl)
    title = driver.find_element_by_id('subject_tpc').text
    content = driver.find_element_by_id('read_tpc').text
    if(title.find(u'keyword') >= 0 or content.find(u'keyword') >= 0):
    driver.get("about:blank") #This is what I add
    return (True,subUrl,title,content)
    driver.get("about:blank") #This is what I add
    return (False,)

SEEM TO BE THE END

As I said above, there's a pipe error right after I driver.get("about:blank") sometimes, nevertheless, the good news is that everything works normally this time. If anyone knows something about selenium which is relevant to this post, please inform me, I'd really apprieciate that.

回答1:

I took the time to simplify and clean up your code.

previously_seen_sub_urls = set()

with open('d:/seizeFloorRec.txt', 'a') as outfile:
    outfile.write(
        '\ncur time: ' +
        time.strftime('%Y-%m-%d %H:%M:%S',time.localtime(time.time())) +
        '\n')
driver = webdriver.Chrome()
while True:
    try:
        print "time: ", time.strftime('%Y-%m-%d %H:%M:%S',
                                      time.localtime(time.time()))
        sub_urls = aMethod(driver) # an irrelevant function which returns a list

        time.sleep(2)  # Why sleep here?

        print "max_cnt=[%d]" % len(sub_urls)
        for i, sub_url in enumerate(sub_urls):
            print "cur_idx=[%s]" % i
            try:
                rtn = monitorFloorKeyword(sub_urls[i])
                # rtn is either a length 1 tuple, first value False
                # or a length 4 tuple, (True, sub_url, title, content)

                time.sleep(1.5)

                if rtn[0]:
                    if rtn[1] not in previously_seen_sub_urls:
                        print "hit!"
                        previously_seen_sub_urls.add(rtn[1])
                        outfile.write(rtn[1]+'\t'+rtn[2].encode('utf-8')+'\n')
                        outfile.flush()
                    else:
                        print "hit but not write."

            except Exception as e:  # Should catch specific subclass of Exception
                print "exception when get page: ", sub_urls[i]
                print e
                # Continues

        print "sleep 5*60 sec..."
        time.sleep(300)  # PROBLEM POSSIBLY DOESN'T LIE HERE!!!
        print "sleep completes."

    except Exception as e:  #  Should catch specific subclass of Exception
        print 'exception!'
        print e
        time.sleep(20)
        # Continues

I haven't definitely found the problem, but I am suspicious of your exception handlers.

With your exception handlers, it is best to avoid "except Exception" apart from very limited situations (e.g. at the very outer loop of your code) because it shows you don't know what exception (or at least subclass of Exception) you are expecting to get, so it isn't clear whether the actions you take are correct.

The second problem is that you don't print the exception, but you print the exception's doc string. For Python's built-in exceptions, these strings may be useful, but they are not guaranteed to be set for custom exceptions. You may find exceptions are not displaying.

This doesn't explain your problem, but I would be interested to see if changing it to print the exception directly, rather than e.__doc__ would help. (See also the traceback module to learn more about where an exception has come from.)

回答2:

So get rid of time.sleep and try to use implicitly_wait

ff = webdriver.Firefox() 
ff.implicitly_wait(30)

or try to use WebDriverWait

ff = webdriver.Firefox()
ff.get("http://somedomain/url_that_delays_loading")
try:
    element = WebDriverWait(ff, 10).until(EC.presence_of_element_located((By.ID, "myDynamicElement")))
finally:
    ff.quit()

also check about waits in selenium

来源：https://stackoverflow.com/questions/21086686/when-would-python-stuck-at-time-sleep-function

标签

python

selenium

thread-sleep