Here is the simple program which does not work
from ghost import Ghost
ghost = Ghost(wait_timeout=40)
page, extra_resources = ghost.open("http://samsung.com/in/consumer/mobile-phone/mobile-phone/smartphone/")
ghost.wait_page_loaded()
n=2;
links=ghost.evaluate("alist=document.getElementsByTagName('a');alist")
print links
ERROR IS: raise Exception(timeout_message)
Exception: Unable to load requested page
iS there some problem with the program?
Seem like people are reporting similar issues to yours, without really getting any explanation (for example: https://github.com/jeanphix/Ghost.py/issues/26)
Adjust the evaluate line to the following, which is referenced by a ghost.py documentation:
links = gh.evaluate("""
var links = document.querySelectorAll("a");
var listRet = [];
for (var i=0; i<links.length; i++){
listRet.push(links[i].href);
}
listRet;
""")
I was getting this error with every page I tried when I first got Ghost.py, the way I went about solving it was I scrapped PyQt and installed PySide instead. That fixed it for me anyway.
I had to add extra logic in the ghost.py wait_for_page_loaded func:
reTmp = str(resource.url)
if "PyQt4" in reTmp:
reTmp = str(reTmp).replace("PyQt4.QtCore.QUrl(u\'", "").replace("\')","")
if url == reTmp:
page = resource
PyQt was adding stupid junk to resource.url, so url==resource.url could never load a page properly.
ghost.py requires either PySide (preferred) or PyQt Qt bindings:
pip install pyside
pip install ghost.py --pre
try install pyside instead of pyqt. this work for me.
来源:https://stackoverflow.com/questions/14575181/screen-scraping-using-ghost-py