问题
I'm trying to make a program that:
- reads a list of Chinese characters from a file, makes a dictionary from them (associating a sign with its meaning).
- picks a random character and sends it to the browser using the
BaseHTTPServer
module when it gets a GET request.
Once I managed to read and store the signs properly (I tried writing them into another file to check that I got them right and it worked) I couldn't figure out how to send them to my browser.
I connect to 127.0.0.1:4321 and the best I've managed is to get a (supposedly) url-encoded Chinese character, with its translation.
Code:
# -*- coding: utf-8 -*-
import codecs
from BaseHTTPServer import HTTPServer, BaseHTTPRequestHandler
from SocketServer import ThreadingMixIn
import threading
import random
import urllib
source = codecs.open('./signs_db.txt', 'rb', encoding='utf-16')
# Checking utf-16 works fine with chinese characters and stuff :
#out = codecs.open('./test.txt', 'wb', encoding='utf-16')
#for line in source:
# out.write(line)
db = {}
next(source)
for line in source:
if not line.isspace():
tmp = line.split('\t')
db[tmp[0]] = tmp[1].strip()
class Handler(BaseHTTPRequestHandler):
def do_GET(self):
self.send_response(200)
self.end_headers()
message = threading.currentThread().getName()
rKey = random.choice(db.keys())
self.wfile.write(urllib.quote(rKey.encode("utf-8")) + ' : ' + db[rKey])
self.wfile.write('\n')
return
class ThreadedHTTPServer(ThreadingMixIn, HTTPServer):
"""Handle requests in a separate thread."""
if __name__ == '__main__':
server = ThreadedHTTPServer(('localhost', 4321), Handler)
print 'Starting server, use <Ctrl-C> to stop'
server.serve_forever()
If I don't urlencode the chinese character, I get an error from python :
self.wfile.write(rKey + ' : ' + db[rKey])
Which gives me this:
UnicodeEncodeError : 'ascii' codec can't encode character u'\u4e09' in position 0 : ordinal not in range(128)
I've also tried encoding/decoding with 'utf-16', and I still get that kind of error messages.
Here is my test file:
Sign Translation
一 One
二 Two
三 Three
四 Four
五 Five
六 Six
七 Seven
八 Eight
九 Nine
十 Ten
So, my question is: "How can I get the Chinese characters coming from my script to display properly in my browser"?
回答1:
Declare the encoding of your page by writing a meta tag and make sure to encode the entire Unicode string in UTF-8:
self.wfile.write(u'''\
<html>
<headers>
<meta http-equiv="content-type" content="text/html;charset=UTF-8">
</headers>
<body>
{} : {}
</body>
</html>'''.format(rKey,db[rKey]).encode('utf8'))
And/or declare the HTTP content type:
self.send_response(200)
self.send_header('Content-Type','text/html; charset=utf-8')
self.end_headers()
来源:https://stackoverflow.com/questions/14174260/reading-chinese-characters-in-a-file-and-sending-them-to-a-browser