Get value of attribute using CSS Selectors with BeutifulSoup

问题

I am web-scraping with Python and using BeutifulSoup library

I have HTML markup like this:

<tr class="deals" data-url="www.example2.com">
<span class="hotel-name">
<a href="www.example2.com"></a>
</span>
</tr>
<tr class="deals" data-url="www.example3.com">
<span class="hotel-name">
<a href="www.example3.com"></a>
</span>
</tr>

I want to get the data-url or the href value in all <tr>s. Better If I can get href value

Here is a little snippet of my relevant code:

main_url =  "http://localhost/test.htm"
page  = requests.get(main_url).text
soup_expatistan = BeautifulSoup(page)

print (soup_expatistan.select("tr.deals").data-url)
# or  print (soup_expatistan.select("tr.deals").["data-url"])

回答1:

You can use tr.deals span.hotel-name a CSS Selector to get to the link:

from bs4 import BeautifulSoup

data = """
<tr class="deals" data-url="www.example.com">
<span class="hotel-name">
<a href="wwwexample2.com"></a>
</span>
</tr>
"""

soup = BeautifulSoup(data)
print(soup.select('tr.deals span.hotel-name a')[0]['href'])

Prints:

wwwexample2.com

If you have multiple links, iterate over them:

for link in soup.select('tr.deals span.hotel-name a'):
    print(link['href'])

来源：https://stackoverflow.com/questions/26803272/get-value-of-attribute-using-css-selectors-with-beutifulsoup

标签

python

css

python-3.x

beautifulsoup

html-parsing

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!