I\'ve been grappling with using PhantomJS/Selenium/python-selenium to download a file to the filesystem. I\'m able to easily navigate through the DOM and click, hover etc. D
My use case required a form submission to retrieve the file. I was able to accomplish this using the driver's execute_async_script()
function.
js = '''
var callback = arguments[0];
var theForm = document.forms['theFormId'];
data = new FormData();
data.append('eventTarget', "''' + target + '''"); // this is the id of the file clicked
data.append('otherFormField', theForm.otherFormField.value);
var xhr = new XMLHttpRequest();
xhr.open('POST', theForm.action, true);
'''
for cookie in driver.get_cookies():
js += ' xhr.setRequestHeader("' + cookie['name'] + '", "' + cookie['value'] + '"); '
js += '''
xhr.onload = function () {
callback(this.responseText);
};
xhr.send(data);
'''
driver.set_script_timeout(30)
file = driver.execute_async_script(js)
PhantomJS doesn't currently support file downloads. Relevant issues with workarounds:
As far as I understand, you have at least 3 options:
casperjs
(and you should leave python here)xvfb
Here are also some links that might help too:
Is not posible in that way. You can use other alternatives to download files like wget o curl.
Use firefox to find the right request and selenium to get the values for that and finally use out of to the box to download the file
curlCall=" curl 'http://www_sitex_org/descarga.jsf' -H '...allCurlRequest....' > file.xml"
subprocess.call(curlCall, shell=True)
Despite this question is quite old, downloading files through PhantomJS
is still a problem. But we can use PhantomJS to get download link and fetch all needed cookies such as csrf tokens and so on. And then we can use requests
to download it actually:
import requests
from selenium import webdriver
driver = webdriver.PhantomJS()
driver.get('page_with_download_link')
download_link = driver.find_element_by_id('download_link')
session = requests.Session()
cookies = driver.get_cookies()
for cookie in cookies:
session.cookies.set(cookie['name'], cookie['value'])
response = session.get(download_link)
And now in response.content
actual file content should appear. We can next write it with open
or do whatever we want.