问题
I am trying to obtain the odds of the link and I get an error. DO you know what I am doing wrong?
Thank you
import requests
from bs4 import BeautifulSoup as bs
url = 'https://www.oddsportal.com/soccer/spain/laliga'
r = requests.get(url, headers = {'User-Agent' : 'Mozilla/5.0'})
soup = bs(r.content, 'lxml')
##print([a.text for a in soup.select('#tournamentTable tr[xeid] [href*=soccer]')])
print([b.text for b in soup.select('#tournamentTable td[xodd]')])
I am expecting to obtain 10 rows and 3 columns, one per each odd. However, I have the following error
Traceback (most recent call last):
File "/Users/.py", line 14, in <module>
print([b.text for b in soup.select('#tournamentTable td[xodd]')])
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/bs4/element.py", line 1376, in select
return soupsieve.select(selector, self, namespaces, limit, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/soupsieve/__init__.py", line 114, in select
return compile(select, namespaces, flags, **kwargs).select(tag, limit)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/soupsieve/__init__.py", line 63, in compile
return cp._cached_css_compile(pattern, namespaces, custom, flags)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/soupsieve/css_parser.py", line 214, in _cached_css_compile
CSSParser(pattern, custom=custom_selectors, flags=flags).process_selectors(),
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/soupsieve/css_parser.py", line 1113, in process_selectors
return self.parse_selectors(self.selector_iter(self.pattern), index, flags)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/soupsieve/css_parser.py", line 946, in parse_selectors
key, m = next(iselector)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/soupsieve/css_parser.py", line 1100, in selector_iter
raise SelectorSyntaxError(msg, self.pattern, index)
File "<string>", line None
soupsieve.util.SelectorSyntaxError: Invalid character '\x1b' position 17
line 1:
#tournamentTable td[xodd]
^
...
回答1:
It looks like you have wrong char between #tournamentTable
and td[xodd]
. It may look like space but it has code \x1b
. You may try to delete this char and put space again.
I can run your code without this error. But this page uses JavaScript to get data and BS
can't run JavaScript. You may need Selenium to control web browser which can run JavaScript and you can get HTML with data.
Or you can use DevTool in Chrome/Firefox to check if JavaScript read data from some url and read data from the same url.
I found url
https://fb.oddsportal.com/ajax-sport-country-tournament/1/YLO7JZEA/X0/1/?_=1558215347943
Last part is current date as timestamp * 1000
import datetime
print(datetime.datetime.fromtimestamp(1558215347943/1000))
# 2019-05-18 23:35:47.943000
dt = datetime.datetime.now()
print(int(dt.timestamp()*1000))
# 1558216525573
Using requests.Session()
and better headers
I can read from this url. It gives data as JavaScript code but after cutting some part I get data in JSON format which can be converted to Python dictionary
import requests
from bs4 import BeautifulSoup as bs
import json
s = requests.Session()
headers = {
'User-Agent' : 'Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0'
}
url = 'https://www.oddsportal.com/soccer/spain/laliga'
r = s.get(url, headers=headers)
soup = bs(r.content, 'lxml')
print(r.text.find('xodd'))
print([b.text for b in soup.select('#tournamentTable td[xodd]')])
headers = {
'User-Agent' : 'Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0',
'Referer': 'https://www.oddsportal.com/soccer/spain/laliga/',
}
r = s.get('https://fb.oddsportal.com/ajax-sport-country-tournament/1/YLO7JZEA/X0/1/?_=1558215347943', headers=headers)
text = r.text[len("globals.jsonpCallback('/ajax-sport-country-tournament/1/YLO7JZEA/X0/1/', "):-2]
data = json.loads(text)
for key, val in data['d']['oddsData'].items():
print('xeid:', key)
print('xoid:', val['odds'][0]['oid'], 'avg:', val['odds'][0]['avg'])
print('xoid:', val['odds'][0]['oid'], 'avg:', val['odds'][1]['avg'])
print('xoid:', val['odds'][0]['oid'], 'avg:', val['odds'][2]['avg'])
print('---')
Result:
xeid: ltB92yKu
xoid: 35vjqxv464x0x7qrck avg: 2.16
xoid: 35vjqxv464x0x7qrck avg: 3.5
xoid: 35vjqxv464x0x7qrck avg: 3.44
---
xeid: SW9D1eZo
xoid: 35vjrxv464x0x7qrcm avg: 1.33
xoid: 35vjrxv464x0x7qrcm avg: 5.71
xoid: 35vjrxv464x0x7qrcm avg: 8.83
---
xeid: Mg9H0Flh
xoid: 35vjsxv464x0x7qrco avg: 1.99
xoid: 35vjsxv464x0x7qrco avg: 3.79
xoid: 35vjsxv464x0x7qrco avg: 3.68
---
xeid: zcDLaZ3b
xoid: 35vjtxv464x0x7qrcq avg: 1.57
xoid: 35vjtxv464x0x7qrcq avg: 4.38
xoid: 35vjtxv464x0x7qrcq avg: 5.95
---
EDIT: using Selenium
import selenium.webdriver
url = 'https://www.oddsportal.com/soccer/spain/laliga'
driver = selenium.webdriver.Firefox()
driver.get(url)
items = driver.find_elements_by_css_selector("#tournamentTable td[xodd]")
print([x.text for x in items])
Result:
['4.26', '4.07', '1.80', '1.99', '3.79', '3.68', '1.57', '4.38', '5.95', '2.13', '3.19', '3.94', '7.82', '5.00', '1.41', '2.16', '3.50', '3.44', '1.33', '5.71', '8.83', '2.58', '3.52', '2.73', '1.49', '5.31', '5.66', '4.03', '4.21', '1.82']
来源:https://stackoverflow.com/questions/56201899/get-td-text-with-select