Decode HTML entities in Python string?

后端 未结 6 920
名媛妹妹
名媛妹妹 2020-11-21 06:18

I\'m parsing some HTML with Beautiful Soup 3, but it contains HTML entities which Beautiful Soup 3 doesn\'t automatically decode for me:

>>> from Be         


        
6条回答
  •  春和景丽
    2020-11-21 06:44

    Python 3.4+

    Use html.unescape():

    import html
    print(html.unescape('£682m'))
    

    FYI html.parser.HTMLParser.unescape is deprecated, and was supposed to be removed in 3.5, although it was left in by mistake. It will be removed from the language soon.


    Python 2.6-3.3

    You can use HTMLParser.unescape() from the standard library:

    • For Python 2.6-2.7 it's in HTMLParser
    • For Python 3 it's in html.parser
    >>> try:
    ...     # Python 2.6-2.7 
    ...     from HTMLParser import HTMLParser
    ... except ImportError:
    ...     # Python 3
    ...     from html.parser import HTMLParser
    ... 
    >>> h = HTMLParser()
    >>> print(h.unescape('£682m'))
    £682m
    

    You can also use the six compatibility library to simplify the import:

    >>> from six.moves.html_parser import HTMLParser
    >>> h = HTMLParser()
    >>> print(h.unescape('£682m'))
    £682m
    

提交回复
热议问题