I\'m making a small web crawler using python 3.5.1 and requests module, which downloads all comics from a specific website.I\'m experimenting with one page. I parse the page
I would do it in one go using a CSS selector:
for img in soup.select("a.img-link img[src]"):
print(img["src"])
Here, we are getting all of the img
elements having an src
attribute located under an a
element with a img-link
class. It prints:
http://2.p.mpcdn.net/352582/687224/1.jpg
http://2.p.mpcdn.net/352582/687224/2.jpg
http://2.p.mpcdn.net/352582/687224/3.jpg
http://2.p.mpcdn.net/352582/687224/4.jpg
...
http://2.p.mpcdn.net/352582/687224/20.jpg
If you still want to use the find_all()
, you would have to nest it:
for link in soup.find_all("a", class_ = "img-link"):
for img in link.find_all("a", src=True): # searching for img with src attribute
print(img["src"])