问题
I want to catch some tags with BeautifulSoup: Some <p>
tags, the <title>
tag, some <meta>
tags. But I want to catch them regardless of their case; I know that some sites do meta like this: <META>
and I want to be able to catch that.
I noticed that BeautifulSoup is case-sensitive by default. How do I catch these tags in a non-case-sensitive way?
回答1:
You can use soup.findAll which should match case-insensitively:
import BeautifulSoup
html = '''<html>
<head>
<meta name="description" content="Free Web tutorials on HTML, CSS, XML" />
<META name="keywords" content="HTML, CSS, XML" />
<title>Test</title>
</head>
<body>
</body>
</html>'''
soup = BeautifulSoup.BeautifulSoup(html)
for x in soup.findAll('meta'):
print x
Result:
<meta name="description" content="Free Web tutorials on HTML, CSS, XML" /> <meta name="keywords" content="HTML, CSS, XML" />
回答2:
BeautifulSoup standardises the parse tree on input. It converts tags to lower-case. You don't have anything to worry about IMO.
来源:https://stackoverflow.com/questions/3352563/getting-beautifulsoup-to-catch-tags-in-a-non-case-sensitive-way