How to get only first class' data between two same classes

房东的猫 提交于 2019-12-12 12:33:56

问题


On https://www.hltv.org/matches page, matches divided by dates but the classes are same. I mean,

This is today's match class

<div class="match-day"><div class="standard-headline">2018-05-01</div>

This is tommorow's match class.

<div class="match-day"><div class="standard-headline">2018-05-02</div>

What i'm trying to do is, I wanna get the links under the "standard-headline" class but only today's matches. Like, getting the only first one.

Here is my code.

import urllib.request
from bs4 import BeautifulSoup
headers = {}  # Headers gives information about you like your operation system, your browser etc.
headers['User-Agent'] = 'Mozilla/5.0'  # I defined a user agent because HLTV perceive my connection as bot.
hltv = urllib.request.Request('https://www.hltv.org/matches', headers=headers)  # Basically connecting to website
session = urllib.request.urlopen(hltv)
sauce = session.read()  # Getting the source of website
soup = BeautifulSoup(sauce, 'lxml')

matchlinks = []
# Getting the match pages' links.
for links in soup.find_all('div', class_='upcoming-matches'):  # Looking for "upcoming-matches" class in source.
    for links in soup.find_all('a'):  # Finding "a" tag under "upcoming-matches" class.
        clearlink = links.get('href')  # Getting the value of variable.
        if clearlink.startswith('/matches/'):  # Checking for if our link starts with "/matches/"
            matchlinks.append('https://hltv.org' + clearlink)  # Adding into list.

回答1:


Actually, the website shows today's matches first (at the top), and then the next days'. So, if you want to get today's matches, you can simply use find(), which return the first match found.

Using this will give you what you want:

today = soup.find('div', class_='match-day')

But, if you want to explicitly specify the date, you can find the tag containing today's date, by using text='2018-05-02' as a parameter for the find() method. But, note that in the page source, the tag is <span class="standard-headline">2018-05-02</span> and not a <div> tag. After getting this tag, use .parent to get the <div class="match-day"> tag.

today = soup.find('span', text='2018-05-02').parent

Again, if you want to make the solution more generic, you can use datetime.date.today() instead of the hard-coded date.

today = soup.find('span', text=datetime.date.today()).parent

You'll have to import the datetime module for this.



来源:https://stackoverflow.com/questions/50120344/how-to-get-only-first-class-data-between-two-same-classes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!