Fetching Image from URL using BeautifulSoup

 ̄綄美尐妖づ 提交于 2019-12-04 06:05:47

问题


I am trying to fetch important images and not thumbnail or other gifs from the Wikipedia page and using following code. However the "img" is coming as length of "0". any suggestion on how to rectify it.

Code :

import urllib
import urllib2
from bs4 import BeautifulSoup
import os

html = urllib2.urlopen("http://en.wikipedia.org/wiki/Main_Page")

soup = BeautifulSoup(html)

imgs = soup.findAll("div",{"class":"image"})

Also if someone can explain in detail that how to use the findAll by looking at "source element" in webpage. That will be awesome.


回答1:


The a tags on the page have an image class, not div:

>>> img_links = soup.findAll("a", {"class":"image"})
>>> for img_link in img_links:
...     print img_link.img['src']
... 
//upload.wikimedia.org/wikipedia/commons/thumb/1/1f/Stora_Kronan.jpeg/100px-Stora_Kronan.jpeg
//upload.wikimedia.org/wikipedia/commons/thumb/4/4b/Christuss%C3%A4ule_8.jpg/77px-Christuss%C3%A4ule_8.jpg
...

Or, even better, use a.image > img CSS selector:

>>> for img in soup.select('a.image > img'):
...      print img['src']
//upload.wikimedia.org/wikipedia/commons/thumb/1/1f/Stora_Kronan.jpeg/100px-Stora_Kronan.jpeg
//upload.wikimedia.org/wikipedia/commons/thumb/4/4b/Christuss%C3%A4ule_8.jpg/77px-Christuss%C3%A4ule_8.jpg 
...

UPD (downloading images using urllib.urlretrieve):

from urllib import urlretrieve
import urlparse
from bs4 import BeautifulSoup
import urllib2

url = "http://en.wikipedia.org/wiki/Main_Page"
soup = BeautifulSoup(urllib2.urlopen(url))
for img in soup.select('a.image > img'):
    img_url = urlparse.urljoin(url, img['src'])
    file_name = img['src'].split('/')[-1]
    urlretrieve(img_url, file_name)



回答2:


I don't see any div tags with a class called 'image' on that page.

You could get all the image tags and throw away ones that are small.

imgs = soup.select('img')


来源:https://stackoverflow.com/questions/24357317/fetching-image-from-url-using-beautifulsoup

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!