Decode HTML entities in Python string?

后端 未结 6 908
名媛妹妹
名媛妹妹 2020-11-21 06:18

I\'m parsing some HTML with Beautiful Soup 3, but it contains HTML entities which Beautiful Soup 3 doesn\'t automatically decode for me:

>>> from Be         


        
6条回答
  •  栀梦
    栀梦 (楼主)
    2020-11-21 06:40

    Beautiful Soup handles entity conversion. In Beautiful Soup 3, you'll need to specify the convertEntities argument to the BeautifulSoup constructor (see the 'Entity Conversion' section of the archived docs). In Beautiful Soup 4, entities get decoded automatically.

    Beautiful Soup 3

    >>> from BeautifulSoup import BeautifulSoup
    >>> BeautifulSoup("

    £682m

    ", ... convertEntities=BeautifulSoup.HTML_ENTITIES)

    £682m

    Beautiful Soup 4

    >>> from bs4 import BeautifulSoup
    >>> BeautifulSoup("

    £682m

    ")

    £682m

提交回复
热议问题