Failed to grab dates in a cutomized manner out of a tabular content

前端 未结 3 1261
醉酒成梦
醉酒成梦 2021-01-07 13:34

I\'ve written a script in python in combination with selenium to parse some dates available within a table in a webpage. The table is located under the header NPL Vict

3条回答
  •  小鲜肉
    小鲜肉 (楼主)
    2021-01-07 14:26

    I'm not using Selenium, but selected dates can be extracted with just BeautifulSoup. The timedates are coded as Unix timestamp inside tag classes:

    from bs4 import BeautifulSoup
    import requests
    import re
    import datetime
    
    headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0'}
    r = requests.get('http://www.oddsportal.com/soccer/australia/npl-victoria/', headers=headers)
    soup = BeautifulSoup(r.text, 'lxml')
    
    for td in soup.select('table#tournamentTable td.datet'):
        for c in td['class']:
            if re.match(r't\d+', c):
                unix_timestamp = int(re.match(r't(\d+)', c)[1])
                d = datetime.datetime.utcfromtimestamp(unix_timestamp).strftime('%d %b %Y--%H:%M')
                print(d)
    

    Prints:

    10 Aug 2018--09:30
    10 Aug 2018--10:15
    11 Aug 2018--05:00
    11 Aug 2018--05:00
    11 Aug 2018--09:00
    12 Aug 2018--06:00
    12 Aug 2018--06:00
    

    If you want also the matches printed:

    for td in soup.select('table#tournamentTable td.datet'):
        for c in td['class']:
            if re.match(r't\d+', c):
                unix_timestamp = int(re.match(r't(\d+)', c)[1])
                d = datetime.datetime.utcfromtimestamp(unix_timestamp).strftime('%d %b %Y--%H:%M')
                print(d, end=' ')
                print(td.find_next('td').text)
    

    Prints:

    10 Aug 2018--09:30 Melbourne Knights - Port Melbourne Sharks
    10 Aug 2018--10:15 Pascoe Vale - Dandenong Thunder
    11 Aug 2018--05:00 Avondale FC - Bentleigh Greens
    11 Aug 2018--05:00 Northcote City - Bulleen
    11 Aug 2018--09:00 Hume City - Oakleigh Cannons
    12 Aug 2018--06:00 Heidelberg Utd - Green Gully
    12 Aug 2018--06:00 South Melbourne - Kingston City
    

提交回复
热议问题