How do i get rid of all the smart quotes while parsing a web page?

问题

This is my code :

name = namestr.decode("utf-8")

name.replace(u"\u2018", "").replace(u"\u2019", "").replace(u"\u201c","").replace(u"\u201d", "")

This doesn't seem to work . I still find &ldquo , &rdquo etc in my text. Also this text has been parsed using beautiful soup

回答1:

Replace the last line of your code with this one:

name = name.replace(u"\u2018", "").replace(u"\u2019", "").replace(u"\u201c","").replace(u"\u201d", "")

The replace method returns a modified string but it does not affect the sting you call it on so you have to assign the return value to the variable as above.

来源：https://stackoverflow.com/questions/15751636/how-do-i-get-rid-of-all-the-smart-quotes-while-parsing-a-web-page

标签

python

beautifulsoup

nltk

smart-quotes

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!