问题
I am trying to webscrape some parts of this page: https://markets.businessinsider.com/stocks/bp-stock using BeautifulSoup to search for some text contained in h2 title of tables
when i do:
data_table = soup.find('h2', text=re.compile('RELATED STOCKS')).find_parent('div').find('table')
It correctly get the table I am after.
When I try to get the table "Analyst Opinion" using the similar line, it returns None:
data_table = soup.find('h2', text=re.compile('ANALYST OPINIONS')).find_parent('div').find('table')
I am guessing that there might be some special characters in the html code, that provides re to function as expected. I tried this too:
data_table = soup.find('h2', text=re.compile('.*?STOCK.*?INFORMATION.*?', re.DOTALL))
without success.
I would like to get the table that contain this bit of text "Analyst Opinion" without finding all tables but by checking if contains my requested text.
Any idea will be highly appreciated. Best
回答1:
You can use CSS selector to locate the <table>
:
import requests
from bs4 import BeautifulSoup
url = 'https://markets.businessinsider.com/stocks/bp-stock '
soup = BeautifulSoup(requests.get(url).text, 'lxml')
table = soup.select_one('div:has(> h2:contains("Analyst Opinions")) table')
for tr in table.select('tr'):
print(tr.get_text(strip=True, separator=' '))
Prints:
2/26/2018 BP Outperform RBC Capital Markets
9/22/2017 BP Outperform BMO Capital Markets
More about CSS selectors here.
EDIT: For canse-insensitive method, you can use bs4
API with regular expressions (note the flags=re.I
). This is the equivalent of .select()
method above:
import re
import requests
from bs4 import BeautifulSoup
url = 'https://markets.businessinsider.com/stocks/bp-stock '
soup = BeautifulSoup(requests.get(url).text, 'lxml')
h2 = soup.find(lambda t: t.name=='h2' and re.findall('analyst opinions', t.text, flags=re.I))
table = h2.find_parent('div').find('table')
for tr in table.select('tr'):
print(tr.get_text(strip=True, separator=' '))
来源:https://stackoverflow.com/questions/57578730/find-a-tag-using-text-it-contains-using-beautifulsoup