Batch downloading text and images from URL with Python / urllib / beautifulsoup?

独自空忆成欢 提交于 2019-12-03 08:39:27

The OS you are using doesn't know how to write to the file path you are passing it in src. Make sure that the name you use to save the file to disk is one the OS can actually use:

src = "abc.com/alpha/beta/charlie.jpg"
with open(src, "wb") as f:
    # IOError - cannot open file abc.com/alpha/beta/charlie.jpg

src = "alpha/beta/charlie.jpg"
os.makedirs(os.path.dirname(src))
with open(src, "wb" as f:
    # Golden - write file here

and everything will start working.

A couple of additional thoughts:

  1. Make sure to normalize the save file path (e. g. os.path.join(some_root_dir, *relative_file_path*)) - otherwise you'll be writing images all over your hard drive depending on their src.
  2. Unless you are running tests of some kind, it's good to advertise that you are a bot in your user_agent string and honor robots.txt files (or alternately, provide some kind of contact information so people can ask you to stop if they need to).
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!