I\'m parsing some HTML with Beautiful Soup 3, but it contains HTML entities which Beautiful Soup 3 doesn\'t automatically decode for me:
>>> from Be
Use html.unescape():
import html
print(html.unescape('£682m'))
FYI html.parser.HTMLParser.unescape
is deprecated, and was supposed to be removed in 3.5, although it was left in by mistake. It will be removed from the language soon.
You can use HTMLParser.unescape()
from the standard library:
>>> try:
... # Python 2.6-2.7
... from HTMLParser import HTMLParser
... except ImportError:
... # Python 3
... from html.parser import HTMLParser
...
>>> h = HTMLParser()
>>> print(h.unescape('£682m'))
£682m
You can also use the six compatibility library to simplify the import:
>>> from six.moves.html_parser import HTMLParser
>>> h = HTMLParser()
>>> print(h.unescape('£682m'))
£682m