I\'ve written a script in python in combination with selenium to parse some dates available within a table in a webpage. The table is located under the header NPL Vict
I'm not using Selenium, but selected dates can be extracted with just BeautifulSoup. The timedates are coded as Unix timestamp inside tag classes:
from bs4 import BeautifulSoup
import requests
import re
import datetime
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0'}
r = requests.get('http://www.oddsportal.com/soccer/australia/npl-victoria/', headers=headers)
soup = BeautifulSoup(r.text, 'lxml')
for td in soup.select('table#tournamentTable td.datet'):
for c in td['class']:
if re.match(r't\d+', c):
unix_timestamp = int(re.match(r't(\d+)', c)[1])
d = datetime.datetime.utcfromtimestamp(unix_timestamp).strftime('%d %b %Y--%H:%M')
print(d)
Prints:
10 Aug 2018--09:30
10 Aug 2018--10:15
11 Aug 2018--05:00
11 Aug 2018--05:00
11 Aug 2018--09:00
12 Aug 2018--06:00
12 Aug 2018--06:00
If you want also the matches printed:
for td in soup.select('table#tournamentTable td.datet'):
for c in td['class']:
if re.match(r't\d+', c):
unix_timestamp = int(re.match(r't(\d+)', c)[1])
d = datetime.datetime.utcfromtimestamp(unix_timestamp).strftime('%d %b %Y--%H:%M')
print(d, end=' ')
print(td.find_next('td').text)
Prints:
10 Aug 2018--09:30 Melbourne Knights - Port Melbourne Sharks
10 Aug 2018--10:15 Pascoe Vale - Dandenong Thunder
11 Aug 2018--05:00 Avondale FC - Bentleigh Greens
11 Aug 2018--05:00 Northcote City - Bulleen
11 Aug 2018--09:00 Hume City - Oakleigh Cannons
12 Aug 2018--06:00 Heidelberg Utd - Green Gully
12 Aug 2018--06:00 South Melbourne - Kingston City