Is there a way to use PhantomJS in Python?

后端 未结 8 1283
死守一世寂寞
死守一世寂寞 2020-11-22 08:05

I want to use PhantomJS in Python. I googled this problem but couldn\'t find proper solutions.

I find os.popen() may be a good choice. But I couldn\'t

相关标签:
8条回答
  • 2020-11-22 08:13

    The answer by @Pykler is great but the Node requirement is outdated. The comments in that answer suggest the simpler answer, which I've put here to save others time:

    1. Install PhantomJS

      As @Vivin-Paliath points out, it's a standalone project, not part of Node.

      Mac:

      brew install phantomjs
      

      Ubuntu:

      sudo apt-get install phantomjs
      

      etc

    2. Set up a virtualenv (if you haven't already):

      virtualenv mypy  # doesn't have to be "mypy". Can be anything.
      . mypy/bin/activate
      

      If your machine has both Python 2 and 3 you may need run virtualenv-3.6 mypy or similar.

    3. Install selenium:

      pip install selenium
      
    4. Try a simple test, like this borrowed from the docs:

      from selenium import webdriver
      from selenium.webdriver.common.keys import Keys
      
      driver = webdriver.PhantomJS()
      driver.get("http://www.python.org")
      assert "Python" in driver.title
      elem = driver.find_element_by_name("q")
      elem.clear()
      elem.send_keys("pycon")
      elem.send_keys(Keys.RETURN)
      assert "No results found." not in driver.page_source
      driver.close()
      
    0 讨论(0)
  • 2020-11-22 08:15

    this is what I do, python3.3. I was processing huge lists of sites, so failing on the timeout was vital for the job to run through the entire list.

    command = "phantomjs --ignore-ssl-errors=true "+<your js file for phantom>
    process = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE)
    
    # make sure phantomjs has time to download/process the page
    # but if we get nothing after 30 sec, just move on
    try:
        output, errors = process.communicate(timeout=30)
    except Exception as e:
        print("\t\tException: %s" % e)
        process.kill()
    
    # output will be weird, decode to utf-8 to save heartache
    phantom_output = ''
    for out_line in output.splitlines():
        phantom_output += out_line.decode('utf-8')
    
    0 讨论(0)
  • 2020-11-22 08:17

    In case you are using Buildout, you can easily automate the installation processes that Pykler describes using the gp.recipe.node recipe.

    [nodejs]
    recipe = gp.recipe.node
    version = 0.10.32
    npms = phantomjs
    scripts = phantomjs
    

    That part installs node.js as binary (at least on my system) and then uses npm to install PhantomJS. Finally it creates an entry point bin/phantomjs, which you can call the PhantomJS webdriver with. (To install Selenium, you need to specify it in your egg requirements or in the Buildout configuration.)

    driver = webdriver.PhantomJS('bin/phantomjs')
    
    0 讨论(0)
  • 2020-11-22 08:18

    The easiest way to use PhantomJS in python is via Selenium. The simplest installation method is

    1. Install NodeJS
    2. Using Node's package manager install phantomjs: npm -g install phantomjs-prebuilt
    3. install selenium (in your virtualenv, if you are using that)

    After installation, you may use phantom as simple as:

    from selenium import webdriver
    
    driver = webdriver.PhantomJS() # or add to your PATH
    driver.set_window_size(1024, 768) # optional
    driver.get('https://google.com/')
    driver.save_screenshot('screen.png') # save a screenshot to disk
    sbtn = driver.find_element_by_css_selector('button.gbqfba')
    sbtn.click()
    

    If your system path environment variable isn't set correctly, you'll need to specify the exact path as an argument to webdriver.PhantomJS(). Replace this:

    driver = webdriver.PhantomJS() # or add to your PATH
    

    ... with the following:

    driver = webdriver.PhantomJS(executable_path='/usr/local/lib/node_modules/phantomjs/lib/phantom/bin/phantomjs')
    

    References:

    • http://selenium-python.readthedocs.io/
    • How do I set a proxy for phantomjs/ghostdriver in python webdriver?
    • https://dzone.com/articles/python-testing-phantomjs
    0 讨论(0)
  • 2020-11-22 08:19

    If using Anaconda, install with:

    conda install PhantomJS
    

    in your script:

    from selenium import webdriver
    driver=webdriver.PhantomJS()
    

    works perfectly.

    0 讨论(0)
  • 2020-11-22 08:23

    Now since the GhostDriver comes bundled with the PhantomJS, it has become even more convenient to use it through Selenium.

    I tried the Node installation of PhantomJS, as suggested by Pykler, but in practice I found it to be slower than the standalone installation of PhantomJS. I guess standalone installation didn't provided these features earlier, but as of v1.9, it very much does so.

    1. Install PhantomJS (http://phantomjs.org/download.html) (If you are on Linux, following instructions will help https://stackoverflow.com/a/14267295/382630)
    2. Install Selenium using pip.

    Now you can use like this

    import selenium.webdriver
    driver = selenium.webdriver.PhantomJS()
    driver.get('http://google.com')
    # do some processing
    
    driver.quit()
    
    0 讨论(0)
提交回复
热议问题