问题
Yesterday i wrote the following function
to convert integer
to Persian
:
def integerToPersian(number):
listedPersian = ['۰','۱','۲','۳','۴','۵','۶','۷','۸','۹']
listedEnglish = ['0','1','2','3','4','5','6','7','8','9']
returnList = list()
listedTmpString = list(str(number))
for i in listedTmpString:
returnList.append(listedPersian[listedEnglish.index(i)])
return ''.join(returnList)
When you call it such as : integerToPersian(3455)
, it return ۳۴۵۵
,
۳۴۵۵
is equivalent to 3455
in Persian
and Arabic language
.When you read
a number such as reading from databae
, and want to show in widget
, this
function
is very useful.
I downloaded codes charts
of unicode
from http://unicode.org ,Because i need to wrote PersianToInteger('unicodeString')
According to it should get utf-8
as parameter and utf-8
store 2 bytes
,Also i'm newbie in pytho.
My questions are, how can store 2bytes
? , how can utf8
store , how can split an unicode string
to another format ? how can use unicode code charts
?
Notes: I found to use int() built-in fuinction
, but i couldn't use it.may be you can
回答1:
You need to read the Python Unicode HOWTO for either Python 2.x or 3.x, as appropriate. But I can give you brief answers to your questions.
My questions are, how can store 2bytes? how can utf8 store , how can split an unicode string to another format ?
A unicode
object holds characters; a bytes
object holds bytes.
Note that in Python 2.x, str
is the same thing as bytes
; in 3.x, it's the same thing as unicode
. And in both languages, a literal with neither a u
nor a b
prefix is a str
. Since you didn't tell us whether you're using Python 2 or 3, I'll use explicit unicode
and bytes
, and u
and b
prefixes, everywhere.
You convert between them by picking an encoding (in this case, UTF-8) and using the encode
and decode
methods. For example:
>>> my_str = u'۰۱'
>>> my_bytes = b'\xdb\xb0\xdb\xb1'
>>> my_str.encode('utf-8') == my_bytes
True
>>> my_bytes.decode('utf-8') == my_str
True
If you have a UTF-8 bytes
object, you should decode
it to unicode
as early as possible, and do all your work with it in Unicode. Then you don't have to worry about how many bytes something takes, just treat each character as a character. If you need UTF-8 output, encode
back as late as possible.
(Very occasionally, the performance cost of decoding and encoding is too high, and you need to deal with UTF-8 directly. But unless that really is a bottleneck in your code, don't do it.)
So, let's say you wanted to adapt your integerToPersian
to take a UTF-8 English digit string instead of an integer, and to return a UTF-8 Persian digit string instead of a Unicode one. (I'm assuming Python 3 for the purposes of this example.) All you need to do is change str(number)
to number.decode('utf-8')
, and change return ''.join(returnList)
to return ''.join(returnList).encode('utf-8')
, and that's it.
how can use unicode code charts?
Python already comes with the Unicode code charts (and the right ones to match your version of Python) compiled into the unicodedata module, so usually it's a lot easier to just use those than to try to use the charts yourself. For example:
>>> import unicodedata
>>> unicodedata.digit(u'۱')
1
… i need to wrote PersianToInteger('unicodeString')
You really shouldn't need to. Unless you're using a very old Python, int
should do it for you. For example, in 2.6:
>>> int(u'۱۱')
11
If it's not working for you, unicodedata
is the easiest solution:
>>> numeral = u'۱۱'
>>> [unicodedata.digit(ch) for ch in numeral]
[1, 1]
However, either of these will convert digits in any script to a number, not just Persian. And there's nothing in the Unicode charts that will directly tell you that a digit is Persian; the best you can do is parse the name:
>>> all('ARABIC-INDIC DIGIT' in unicodedata.name(ch) for ch in numeral)
True
>>> all('ARABIC-INDIC DIGIT' in unicodedata.name(ch) for ch in '123')
False
If you really want to do things in either direction by mapping digits from one script to another, here's a better solution:
listedPersian = ['۰','۱','۲','۳','۴','۵','۶','۷','۸','۹']
listedEnglish = ['0','1','2','3','4','5','6','7','8','9']
persianToEnglishMap = dict(zip(listedPersian, listedEnglish))
englishToPersianMap = dict(zip(listedEnglish, listedPersian))
def persianToNumber(persian_numeral):
english_numeral = ''.join(persianToEnglishMap[digit] for digit in persial_numeral)
return int(english_numeral)
来源:https://stackoverflow.com/questions/18707008/unicode-and-python-issue-access-to-unicde-code-charts