BeautifulSoup gives me unicode+html symbols, rather than straight up unicode. Is this a bug or misunderstanding?

前端 未结 2 673
滥情空心
滥情空心 2021-01-13 02:53

I\'m using BeautifulSoup to scrape a website. The website\'s page renders fine in my browser:

Oxfam International’s report entitled “Offside! http:

2条回答
  •  终归单人心
    2021-01-13 03:08

    It's actually UTF-8 misencoded as CP1252:

    >>> print u'Oxfam International\xe2€™s report entitled \xe2€œOffside!'.encode('cp1252').decode('utf8')
    Oxfam International’s report entitled “Offside!
    

提交回复
热议问题