I\'ve written a script in python in combination with selenium to download few document files (ending with .doc) from a webpage. The reason I do not wish to use request
Use this code while declaring the Driver object (This is for Java, Python will also have a similar way to accomplish it) This will download the file to the specified location every time.
//Create preference object
HashMap<String, Object> chromePrefs = new HashMap<String , Object>();
//Set Download path
chromePrefs.put("download.default_directory","C:\\Reports\\AutomaionDownloads");
chromePrefs.put("download.directory_upgrade", true);
ChromeOptions options = new ChromeOptions();
options.setExperimentalOption("prefs", chromePrefs);
//Call the Chrome Driver
WebDriver driver = new ChromeDriver(options);
I just added the the rename of the file to move it. So it'll work just as you have it, but then once it downloads the file, will move it to the correct path:
os.rename(desk_location + '\\' + filename, file_location)
Full Code:
import os
import time
from selenium import webdriver
link ='https://www.online-convert.com/file-format/doc'
dirf = os.path.expanduser('~')
desk_location = dirf + r'\Desktop\file_folder'
if not os.path.exists(desk_location):
os.mkdir(desk_location)
def download_files():
driver.get(link)
for item in driver.find_elements_by_css_selector("a[href$='.doc']")[:2]:
filename = item.get_attribute("href").split("/")[-1]
#creating new folder in accordance with filename to store the downloaded file in thier concerning folder
folder_name = item.get_attribute("href").split("/")[-1].split(".")[0]
#set the new location of the folders to be created
new_location = os.path.join(desk_location,folder_name)
if not os.path.exists(new_location):
os.mkdir(new_location)
#set the location of the folders the downloaded files will be within
file_location = os.path.join(new_location,filename)
item.click()
time_to_wait = 10
time_counter = 0
try:
while not os.path.exists(file_location):
time.sleep(1)
time_counter += 1
if time_counter > time_to_wait:break
os.rename(desk_location + '\\' + filename, file_location)
except Exception:pass
if __name__ == '__main__':
chromeOptions = webdriver.ChromeOptions()
prefs = {'download.default_directory' : desk_location,
'profile.default_content_setting_values.automatic_downloads': 1
}
chromeOptions.add_experimental_option('prefs', prefs)
driver = webdriver.Chrome(chrome_options=chromeOptions)
download_files()
Use pathlib library in Python 3 or the pathlib2 library for Python 2 to handle paths. It gives you an object-oriented way to work with files and directories. Also it has PurePath
object, which can work with paths without even touching the filesystem.