Download pdf using urllib?

前端 未结 5 722
北海茫月
北海茫月 2020-12-01 05:31

I am trying to download a pdf file from a website using urllib. This is what i got so far:

import urllib

def download_file(download_url):
    web_file = url         


        
相关标签:
5条回答
  • 2020-12-01 06:14

    I would suggest using following lines of code

    import urllib.request
    import shutil
    url = "link to your website for pdf file to download"
    output_file = "local directory://name.pdf"
    with urllib.request.urlopen(url) as response, open(output_file, 'wb') as out_file:
         shutil.copyfileobj(response, out_file)
    
    0 讨论(0)
  • 2020-12-01 06:20

    Try to use urllib.retrieve (Python 3) and just do that:

    from urllib.request import urlretrieve
    
    def download_file(download_url):
        urlretrieve(download_url, 'path_to_save_plus_some_file.pdf')
    
    if __name__ == 'main':
        download_file('http://www.example.com/some_file.pdf')
    
    0 讨论(0)
  • 2020-12-01 06:23

    The tried the above code, they work fine in some cases, but for some website with pdf embedded in it, you might get an error like HTTPError: HTTP Error 403: Forbidden. Such websites have some server security features which will block known bots. In case of urllib it uses a header which will say something like ====> python urllib/3.3.0. So I would suggest adding a custom header too in request module of urllib as shown below.

    from urllib.request import Request, urlopen 
    import requests  
    url="https://realpython.com/python-tricks-sample-pdf"  
    import urllib.request  
    req = urllib.request.Request(url, headers={'User-Agent': 'Mozilla/5.0'})  
    r = requests.get(url)
    
    with open("<location to dump pdf>/<name of file>.pdf", "wb") as code:
        code.write(r.content)
    
    0 讨论(0)
  • 2020-12-01 06:26

    Change open('some_file.pdf', 'w') to open('some_file.pdf', 'wb'), pdf files are binary files so you need the 'b'. This is true with pretty much any file that you can't open in a text editor.

    0 讨论(0)
  • 2020-12-01 06:34

    Here is an example that works:

    import urllib2
    
    def main():
        download_file("http://mensenhandel.nl/files/pdftest2.pdf")
    
    def download_file(download_url):
        response = urllib2.urlopen(download_url)
        file = open("document.pdf", 'wb')
        file.write(response.read())
        file.close()
        print("Completed")
    
    if __name__ == "__main__":
        main()
    
    0 讨论(0)
提交回复
热议问题