问题
Currently, I'm using selenium
in python
to do something which needs a never-end-up loop to monitor what I want, here's the code snippet:
records = set()
fileHandle = open('d:/seizeFloorRec.txt', 'a')
fileHandle.write('\ncur time: '+time.strftime('%Y-%m-%d %H:%M:%S',time.localtime(time.time()))+'\n')
driver = webdriver.Chrome()
while(True):
try:
print "time: ", time.strftime('%Y-%m-%d %H:%M:%S',time.localtime(time.time()))
subUrls = aMethod(driver) # a irrelevant function which returns a list
time.sleep(2)
for i in range(0, len(subUrls)):
print "cur_idx=["+str(i)+"], max_cnt=["+str(len(subUrls))+"]"
try:
rtn = monitorFloorKeyword(subUrls[i])
time.sleep(1.5)
if(rtn[0] == True):
if(rtn[1] not in records):
print "hit!"
records.add(rtn[1])
fileHandle.write(rtn[1]+'\t'+rtn[2].encode('utf-8')+'\n')
fileHandle.flush()
else:
print "hit but not write."
except Exception as e:
print "exception when get page: ", subUrls[i]
print e.__doc__
continue
print "sleep 5*60 sec..."
time.sleep(300) # PROBLEM LIES HERE!!!
print "sleep completes."
except Exception as e:
print 'exception!'
print e.__doc__
time.sleep(20)
it always stucks unpredictably at time.sleep(300)
, with the output "sleep 5*60 sec..." yet without "sleep completes.".
Could anyone just give me some probable cause to this phenomenon? Thanks a lot!
UPDATED
I've found a similar problem here, but I don't actually get the point which he wanna say. Hope it will contribute to my problem.
LATEST TEST
Since using chromedriver
, I added driver.get("about:blank")
right before every return line in each function as below so as to force stopping async page-load of current page. and this force-stop-operation causes ERROR ipc_channel_win.cc(370)] pipe error: 109 sometimes which does NOT affect the running of my program. Is this what affects my time.sleep
function?
def retrieveCurHomePageAllSubjectUrls(driver):
uri = "http://www.example.com/main.php?page=1"
driver.get(uri)
element = driver.find_elements_by_class_name('subject')
subUrls = []
for i in range(0, len(element)):
subUrls.append(element[i].get_attribute('href').encode('utf-8'))
driver.get("about:blank") #This is what I add
return subUrls
def monitorFloorKeyword(subUrl):
driver.get(subUrl)
title = driver.find_element_by_id('subject_tpc').text
content = driver.find_element_by_id('read_tpc').text
if(title.find(u'keyword') >= 0 or content.find(u'keyword') >= 0):
driver.get("about:blank") #This is what I add
return (True,subUrl,title,content)
driver.get("about:blank") #This is what I add
return (False,)
SEEM TO BE THE END
As I said above, there's a pipe error right after I driver.get("about:blank")
sometimes, nevertheless, the good news is that everything works normally this time. If anyone knows something about selenium
which is relevant to this post, please inform me, I'd really apprieciate that.
回答1:
I took the time to simplify and clean up your code.
previously_seen_sub_urls = set()
with open('d:/seizeFloorRec.txt', 'a') as outfile:
outfile.write(
'\ncur time: ' +
time.strftime('%Y-%m-%d %H:%M:%S',time.localtime(time.time())) +
'\n')
driver = webdriver.Chrome()
while True:
try:
print "time: ", time.strftime('%Y-%m-%d %H:%M:%S',
time.localtime(time.time()))
sub_urls = aMethod(driver) # an irrelevant function which returns a list
time.sleep(2) # Why sleep here?
print "max_cnt=[%d]" % len(sub_urls)
for i, sub_url in enumerate(sub_urls):
print "cur_idx=[%s]" % i
try:
rtn = monitorFloorKeyword(sub_urls[i])
# rtn is either a length 1 tuple, first value False
# or a length 4 tuple, (True, sub_url, title, content)
time.sleep(1.5)
if rtn[0]:
if rtn[1] not in previously_seen_sub_urls:
print "hit!"
previously_seen_sub_urls.add(rtn[1])
outfile.write(rtn[1]+'\t'+rtn[2].encode('utf-8')+'\n')
outfile.flush()
else:
print "hit but not write."
except Exception as e: # Should catch specific subclass of Exception
print "exception when get page: ", sub_urls[i]
print e
# Continues
print "sleep 5*60 sec..."
time.sleep(300) # PROBLEM POSSIBLY DOESN'T LIE HERE!!!
print "sleep completes."
except Exception as e: # Should catch specific subclass of Exception
print 'exception!'
print e
time.sleep(20)
# Continues
I haven't definitely found the problem, but I am suspicious of your exception handlers.
With your exception handlers, it is best to avoid "except Exception" apart from very limited situations (e.g. at the very outer loop of your code) because it shows you don't know what exception (or at least subclass of Exception) you are expecting to get, so it isn't clear whether the actions you take are correct.
The second problem is that you don't print the exception, but you print the exception's doc string. For Python's built-in exceptions, these strings may be useful, but they are not guaranteed to be set for custom exceptions. You may find exceptions are not displaying.
This doesn't explain your problem, but I would be interested to see if changing it to print the exception directly, rather than e.__doc__
would help. (See also the traceback
module to learn more about where an exception has come from.)
回答2:
So get rid of time.sleep
and try to use implicitly_wait
ff = webdriver.Firefox()
ff.implicitly_wait(30)
or try to use WebDriverWait
ff = webdriver.Firefox()
ff.get("http://somedomain/url_that_delays_loading")
try:
element = WebDriverWait(ff, 10).until(EC.presence_of_element_located((By.ID, "myDynamicElement")))
finally:
ff.quit()
also check about waits in selenium
来源:https://stackoverflow.com/questions/21086686/when-would-python-stuck-at-time-sleep-function