CSS selectors to be used for scraping specific links

ぃ、小莉子 提交于 2019-12-24 07:29:32

问题


I am new to Python and working on a scraping project. I am using Firebug to copy the CSS path of required links. I am trying to collect the links under the tab of "UPCOMING EVENTS" from http://kiascenehai.pk/ but it is just for learning how I can get the specified links.

I am looking for the fix of this problem and also suggestions for how to retrieve specified links using CSS selectors.

from bs4 import BeautifulSoup
import requests

url = "http://kiascenehai.pk/"

r  = requests.get(url)

data = r.text

soup = BeautifulSoup(data)

for link in soup.select("html body div.body-outer-wrapper div.body-wrapper.boxed-mode div.main-     outer-wrapper.mt30 div.main-wrapper.container div.row.row-wrapper div.page-wrapper.twelve.columns.b0 div.row div.page-wrapper.twelve.columns div.row div.eight.columns.b0 div.content.clearfix section#main-content div.row div.six.columns div.small-post-wrapper div.small-post-content h2.small-post-title a"):
    print  link.get('href')

回答1:


First of all, that page requires a city selection to be made (in a cookie). Use a Session object to handle this:

s = requests.Session()
s.post('http://kiascenehai.pk/select_city/submit_city', data={'city': 'Lahore'})
response = s.get('http://kiascenehai.pk/')

Now the response gets the actual page content, not redirected to the city selection page.

Next, keep your CSS selector no larger than needed. In this page there isn't much to go on as it uses a grid layout, so we first need to zoom in on the right rows:

upcoming_events_header = soup.find('div', class_='featured-event')
upcoming_events_row = upcoming_events_header.find_next(class_='row')

for link in upcoming_events_row.select('h2 a[href]'):
    print link['href']



回答2:


This is co-founder KiaSceneHai.pk; please don't scrape websites, alot of effort goes into collecting the data, we offer access through our API, you can use the contact form to request access, ty



来源:https://stackoverflow.com/questions/24789094/css-selectors-to-be-used-for-scraping-specific-links

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!