My code here:
# coding:utf-8
if __name__ == \'__main__\':
from urllib2 import urlopen
url = \'http://iccna.blog.sohu.com/164572951.html\'
data =
The problem is that the server returns the data compressed by Gzip. Try this:
#-*- coding: utf-8 -*-
from __future__ import print_function
import gzip
import StringIO
import urllib2
from BeautifulSoup import BeautifulSoup
url = 'http://iccna.blog.sohu.com/164572951.html'
response = urllib2.urlopen(url)
data = response.read()
data = StringIO.StringIO(data)
gzipper = gzip.GzipFile(fileobj=data)
html = gzipper.read()
soup = BeautifulSoup(html, fromEncoding='gbk')
print(soup)
The Chinese characters look still wrong on my system, but this may give you right direction.