How to read a file downloaded by selenium webdriver in python

后端 未结 3 1397
遇见更好的自我
遇见更好的自我 2021-01-07 08:13

I am using selenium with webdriver in python to download a csv file from a site . The file gets downloaded into the download directory specified. Here is an overview of my c

相关标签:
3条回答
  • 2021-01-07 08:50

    You can get the last downloaded file from that location and then read the file:

    path = /path to folder
    list = os.listdir(path)
    time_sorted_list = sorted(list, key=os.path.getmtime)
    file_name = time_sorted_list[len(time_sorted_list)-1]
    

    and then u can read from this file. Hoping not multiple files are getting there by parallel processes.

    EDIT: Just saw comment that multiple instances are up for downloading, so other way around you can use urllib and download the file by using its url as:

    import urllib
    urllib.urlretrieve( "http://www.example.com/yourfile.ext", "your-file-name.ext") // you can provide unique-id to your file name
    
    0 讨论(0)
  • 2021-01-07 08:52

    This answer was formed from a combination of previous stack overflow questions , answers as well as comments in this post so thank you everyone.

    I combined selenium webdriver and the python requests module for this solution . I essentially logged into the site using selenium, copied the cookies from the webdriver session and then used a requests.get(url,cookies = webdriver_cookies) to get the file.

    Here's the gist of my solution

    fp = webdriver.FirefoxProfile() 
    fp.set_preference("browser.download.folderList", 2)
    fp.set_preference("browser.download.manager.showWhenStarting", False) 
    fp.set_preference("browser.download.dir",'xx/yy') 
    fp.set_preference('browser.helperApps.neverAsk.saveToDisk', "text/plain, application/vnd.ms-excel, text/csv, text/comma-separated-values, application/octet-stream") 
    driver = webdriver.Firefox(fp)
    
    # selenium login code ...
    
    driver_cookies = driver.get_cookies()
    cookies_copy = {}
    for driver_cookie in driver_cookies:
        cookies_copy[driver_cookie["name"]] = driver_cookie["value"]
    r = requests.get('url',cookies = cookies_copy)
    print r.text
    

    I hope that this helps someone

    0 讨论(0)
  • 2021-01-07 08:58

    Downloading files in Selenium is never a good idea. You cannot control where and under which filename the file is downloaded, and if you want to find out, then you have to use dirty hacks. It depends on the browser and its settings and if the same file has already been downloaded before or not.

    Plus, you have to take care of deleting the file after the download, bc otherwise, numerous copies of the same file will spam your hard drive until it's completely full.

    If possible, you should call something like

    string downloadUrl = ButtonDownloadPdf.GetAttribute("href");
    

    and then handle the downloading yourself, using conventional methods, not Selenium.

    0 讨论(0)
提交回复
热议问题