I want to use PhantomJS in Python. I googled this problem but couldn\'t find proper solutions.
I find os.popen()
may be a good choice. But I couldn\'t
The answer by @Pykler is great but the Node requirement is outdated. The comments in that answer suggest the simpler answer, which I've put here to save others time:
Install PhantomJS
As @Vivin-Paliath points out, it's a standalone project, not part of Node.
Mac:
brew install phantomjs
Ubuntu:
sudo apt-get install phantomjs
etc
Set up a virtualenv
(if you haven't already):
virtualenv mypy # doesn't have to be "mypy". Can be anything.
. mypy/bin/activate
If your machine has both Python 2 and 3 you may need run virtualenv-3.6 mypy
or similar.
Install selenium:
pip install selenium
Try a simple test, like this borrowed from the docs:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.PhantomJS()
driver.get("http://www.python.org")
assert "Python" in driver.title
elem = driver.find_element_by_name("q")
elem.clear()
elem.send_keys("pycon")
elem.send_keys(Keys.RETURN)
assert "No results found." not in driver.page_source
driver.close()
this is what I do, python3.3. I was processing huge lists of sites, so failing on the timeout was vital for the job to run through the entire list.
command = "phantomjs --ignore-ssl-errors=true "+<your js file for phantom>
process = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE)
# make sure phantomjs has time to download/process the page
# but if we get nothing after 30 sec, just move on
try:
output, errors = process.communicate(timeout=30)
except Exception as e:
print("\t\tException: %s" % e)
process.kill()
# output will be weird, decode to utf-8 to save heartache
phantom_output = ''
for out_line in output.splitlines():
phantom_output += out_line.decode('utf-8')
In case you are using Buildout, you can easily automate the installation processes that Pykler describes using the gp.recipe.node recipe.
[nodejs]
recipe = gp.recipe.node
version = 0.10.32
npms = phantomjs
scripts = phantomjs
That part installs node.js as binary (at least on my system) and then uses npm to install PhantomJS. Finally it creates an entry point bin/phantomjs
, which you can call the PhantomJS webdriver with. (To install Selenium, you need to specify it in your egg requirements or in the Buildout configuration.)
driver = webdriver.PhantomJS('bin/phantomjs')
The easiest way to use PhantomJS in python is via Selenium. The simplest installation method is
npm -g install phantomjs-prebuilt
After installation, you may use phantom as simple as:
from selenium import webdriver
driver = webdriver.PhantomJS() # or add to your PATH
driver.set_window_size(1024, 768) # optional
driver.get('https://google.com/')
driver.save_screenshot('screen.png') # save a screenshot to disk
sbtn = driver.find_element_by_css_selector('button.gbqfba')
sbtn.click()
If your system path environment variable isn't set correctly, you'll need to specify the exact path as an argument to webdriver.PhantomJS()
. Replace this:
driver = webdriver.PhantomJS() # or add to your PATH
... with the following:
driver = webdriver.PhantomJS(executable_path='/usr/local/lib/node_modules/phantomjs/lib/phantom/bin/phantomjs')
References:
If using Anaconda, install with:
conda install PhantomJS
in your script:
from selenium import webdriver
driver=webdriver.PhantomJS()
works perfectly.
Now since the GhostDriver comes bundled with the PhantomJS, it has become even more convenient to use it through Selenium.
I tried the Node installation of PhantomJS, as suggested by Pykler, but in practice I found it to be slower than the standalone installation of PhantomJS. I guess standalone installation didn't provided these features earlier, but as of v1.9, it very much does so.
Now you can use like this
import selenium.webdriver
driver = selenium.webdriver.PhantomJS()
driver.get('http://google.com')
# do some processing
driver.quit()