How to download images programmatically from Wikimedia Commons without registering for a Bot account?

前端 未结 5 621
你的背包
你的背包 2021-02-05 10:56

It seems like the only way to get approval for a Bot account is if it adds to or edits information already on Wikimedia. If you try to download any images, without a bot account

相关标签:
5条回答
  • 2021-02-05 11:36

    Didn't really find the answer I'm looking for .. but this page is interesting:: http://www.makeuseof.com/tag/4-free-tools-for-taking-wikipedia-offline/

    Especially #4.. but it seems the page is down.. project dead?

    0 讨论(0)
  • 2021-02-05 11:43

    If you need between ten and one million files, using Magnus Manske's tools to recurse categories is a good choice. http://tools.wmflabs.org/magnustools/can_i_haz_files.html produces a list of UNIX commands which you can then just run locally.

    An alternative, whose interface is in Germany only but easy enough, is https://tools.wmflabs.org/wikilovesdownloads/

    0 讨论(0)
  • 2021-02-05 11:48

    Note that there used to be an issue with using LWP: it's not idealogical, it's practical, agents can create massive load on already stretched servers. There are sensible strategies that agent users can follow to reduce the load - ask on www.mediawiki.org, or en:Village pump - Technical

    0 讨论(0)
  • 2021-02-05 11:49

    Try explaining exactly what you want to do? And what you've tried? What error message did you get? You're not very clear...

    What libraries have you tried? If you're not aggressive, there are no restrictions in downloading WM content. I've never heard of any restrictions. Some User-Agents are banned from editing to avoid stupid spamming, but really, I've never heard of downloading restrictions.

    If you are trying to scrape a massive amount of images, downloading them through Commons, you're doing it wrong (tm). If you are trying to get a few images, anywhere from 10 to 200, you should be able to write a decent tool in a few lines of code, provided that you are respecting the throttling requirement: when the API tells you to slow down, if you don't do it, sysadmins are likely to kick you out.

    If you need a complete image dump, (we're talking of a few TBs) try asking on wikitech-l. We had torrents available when there were less images, now it's more complicated, but still doable.

    About bot accounts. How deep have you looked in the system? You need a bot account for fast, unsupervised edits. Bot privileges also open a few facilities such as increased query sizes. But remember: bot account? it's simply an augmented user-account. Have you tried running anything with a classical account?

    0 讨论(0)
  • 2021-02-05 12:00

    Having just done this myself I feel I should share:

    http://www.mediawiki.org/wiki/API:Allimages

    This API document does state that you can query the images:

    http://en.wikipedia.org/w/api.php?action=query&list=allimages&aiprop=url&format=xml&ailimit=10&aifrom=Albert

    with the aiprop=url you are given the url of the image you are looking for.

    0 讨论(0)
提交回复
热议问题