Python scraping go to next page using BeautifulSoup [closed]

送分小仙女□ 提交于 2020-01-06 07:12:59

问题


This is my scraping code:

import requests
from bs4 import BeautifulSoup as soup
def get_emails(_links:list):
for i in range(len(_links)):
 new_d = soup(requests.get(_links[i]).text, 'html.parser').find_all('a', {'class':'my_modal_open'})
 if new_d:
   yield new_d[-1]['title']

start=20
while True:
d = soup(requests.get('http://www.schulliste.eu/type/gymnasien/?bundesland=&start=20').text, 'html.parser')

results = [i['href'] for i in d.find_all('a')][52:-9]
results = [link for link in results if link.startswith('http://')]
print(list(get_emails(results)))

next_page=soup.find('div', {'class': 'paging'}, 'weiter')

if next_page:

    d=next_page.get('href')
    start+=20
else:
    break

And thats the error I get: AttributeError: 'str' object has no attribute 'find_all'

When you press the button "weiter" (next page) the urlending changes from "...start=20" to "start=40". It is in 20s steps because there are 20 results per site. Does anyone know the reason for the error?


回答1:


You put the 'soup' in a variable called 'd'.

So replace the following line:

next_page=soup.find('div', {'class': 'paging'}, 'weiter')

With this:

next_page = d.find('div', {'class': 'paging'}, 'weiter')


来源:https://stackoverflow.com/questions/52719131/python-scraping-go-to-next-page-using-beautifulsoup

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!