PhantomJS returning empty web page (python, Selenium)

后端 未结 3 1618
感动是毒
感动是毒 2020-12-15 06:51

Trying to screen scrape a web site without having to launch an actual browser instance in a python script (using Selenium). I can do this with Chrome or Firefox - I\'ve trie

相关标签:
3条回答
  • 2020-12-15 07:18

    I was facing the same problem and no amount of code to make the driver wait was helping.
    The problem is the SSL encryption on the https websites, ignoring them will do the trick.

    Call the PhantomJS driver as:

    driver = webdriver.PhantomJS(service_args=['--ignore-ssl-errors=true', '--ssl-protocol=TLSv1'])
    

    This solved the problem for me.

    0 讨论(0)
  • 2020-12-15 07:18

    You need to wait for the page to load. Usually, it is done by using an Explicit Wait to wait for a key element to be present or visible on a page. For instance:

    from selenium.webdriver.support.wait import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
    
    # ...
    browser.get("https://www.whatever.com")
    
    wait = WebDriverWait(driver, 10)
    wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.content")))
    
    html_source = browser.page_source
    # ...
    

    Here, we'll wait up to 10 seconds for a div element with class="content" to become visible before getting the page source.


    Additionally, you may need to ignore SSL errors:

    browser = webdriver.PhantomJS(desired_capabilities=dcap, service_args=['--ignore-ssl-errors=true'])
    

    Though, I'm pretty sure this is related to the redirecting issues in PhantomJS. There is an open ticket in phantomjs bugtracker:

    • PhantomJS does not follow some redirects
    0 讨论(0)
  • 2020-12-15 07:26

    driver = webdriver.PhantomJS(service_args=['--ignore-ssl-errors=true', '--ssl-protocol=TLSv1'])

    This worked for me

    0 讨论(0)
提交回复
热议问题