Here is my situation: I have to login to a Website and download a CSV from there, headless from a linux server. The page uses JS and does not work without it.
After
I found a solution and wanted to share it.
One requirement changed, I am not using PhantomJS
anymore but the chromedriver
which works headlessly with a virtual framebuffer. Same result and it gets the job done.
What you need is:
pip install selenium pyvirtualdisplay
apt-get install xvfb
Download ChromeDriver
I use Py3.5 and a testfile from ovh.net with an tag instead of a button.
The script waits for the to be present on the page then clicks it. If you don't wait for the element and are on an async site, the element you try to click might not be there yet. The download location is a folder relative to the scripts location. The script checks that directory if the file is downloaded already with a second delay. If I am not wrong files should be .part during download and as soon as it becomes the .dat specified in filename
the script finishes. If you close the virtual framebuffer and driver before the download will not complete.
The complete script looks like this:
# !/usr/bin/python
# coding: utf-8
import os
import sys
import time
from pyvirtualdisplay import Display
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import glob
def main(argv):
url = 'http://ovh.net/files'
dl_dir = 'downloads'
filename = '1Mio.dat'
display = Display(visible=0, size=(800, 600))
display.start()
chrome_options = webdriver.ChromeOptions()
dl_location = os.path.join(os.getcwd(), dl_dir)
prefs = {"download.default_directory": dl_location}
chrome_options.add_experimental_option("prefs", prefs)
chromedriver = "./chromedriver"
driver = webdriver.Chrome(executable_path=chromedriver, chrome_options=chrome_options)
driver.set_window_size(800, 600)
driver.get(url)
WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.XPATH, '//a[@href="' + filename + '"]')))
hyperlink = driver.find_element_by_xpath('//a[@href="' + filename + '"]')
hyperlink.click()
while not(glob.glob(os.path.join(dl_location, filename))):
time.sleep(1)
driver.close()
display.stop()
if __name__ == '__main__':
main(sys.argv)
I hope this helps someone in the future.