问题
I am web-scraping with Python
and using BeutifulSoup
library
I have HTML
markup like this:
<tr class="deals" data-url="www.example2.com">
<span class="hotel-name">
<a href="www.example2.com"></a>
</span>
</tr>
<tr class="deals" data-url="www.example3.com">
<span class="hotel-name">
<a href="www.example3.com"></a>
</span>
</tr>
I want to get the data-url
or the href
value in all <tr>
s. Better If I can get href
value
Here is a little snippet of my relevant code:
main_url = "http://localhost/test.htm"
page = requests.get(main_url).text
soup_expatistan = BeautifulSoup(page)
print (soup_expatistan.select("tr.deals").data-url)
# or print (soup_expatistan.select("tr.deals").["data-url"])
回答1:
You can use tr.deals span.hotel-name a
CSS Selector to get to the link:
from bs4 import BeautifulSoup
data = """
<tr class="deals" data-url="www.example.com">
<span class="hotel-name">
<a href="wwwexample2.com"></a>
</span>
</tr>
"""
soup = BeautifulSoup(data)
print(soup.select('tr.deals span.hotel-name a')[0]['href'])
Prints:
wwwexample2.com
If you have multiple links, iterate over them:
for link in soup.select('tr.deals span.hotel-name a'):
print(link['href'])
来源:https://stackoverflow.com/questions/26803272/get-value-of-attribute-using-css-selectors-with-beutifulsoup