how to decode and encode web page with python?

前端 未结 3 2155
慢半拍i
慢半拍i 2021-01-07 06:19

I use Beautifulsoup and urllib2 to download web pages, but different web page has a different encode method, such as utf-8,gb2312,gbk. I use urllib2 get sohu\'s home page, w

3条回答
  •  离开以前
    2021-01-07 06:33

    Another solution.

    from simplified_scrapy.request import req
    from simplified_scrapy.simplified_doc import SimplifiedDoc
    html = req.get('http://www.sohu.com') # This will automatically help you find the correct encoding
    doc = SimplifiedDoc(html)
    print (doc.title.text)
    

提交回复
热议问题