问题
I'm making a virtual machine in RPython using PyPy. My problem is, that I am converting each character into the numerical representation. For example, converting the letter "a" provides this result, 97. And then I convert the 97 to hex, so I get: 0x61.
So for example, I'm trying to convert the letter "á" into the hexadecimal representation which should be: 0xe1 but instead I get 0xc3 0xa1
Is there a specific encoding I need to use? Currently I'm using UTF-8.
--UPDATE--
Where instr is "á"
, (including the quotes)
for char in instr:
char = str(int(ord(char)))
char = hex(int(char))
char = char[2:]
print char # Prints 22 C3 A1 22, 22 is each of the quotes
# The desired output is 22 E1 22
回答1:
#!/usr/bin/env python
# -*- coding: latin-1 -*-
char = 'á'
print str(int(ord(char)))
print hex(int(char))
print char.decode('latin-1')
Gives me:
225
0xe1
0xe1
回答2:
You are using version 2 of Python language therefore your string: "á"
is a byte string, and its contents depend on the encoding of your source file. If the encoding is UTF-8, they are C3 A1
- the string contains two bytes.
If you want to convert it to Unicode codepoints (aka characters), or UTF-16 codepoints (depending on your Python installation), convert it to unicode
first, for example using .decode('utf-8')
.
# -*- encoding: utf-8 -*-
def stuff(instr):
for char in instr:
char = str(int(ord(char)))
char = hex(int(char))
# I'd replace those two lines above with char = hex(ord(char))
char = char[2:]
print char
stuff("á")
print("-------")
stuff(u"á")
Outputs:
c3
a1
-------
e1
来源:https://stackoverflow.com/questions/23271542/rpython-ord-with-non-ascii-character