问题
I have not found a good description on how to handle this problem on windows so I am doing it here.
There are two letters in Turkish ı
(I
) and i
(İ
) which are incorrectly handled by python.
>>> [char for char in 'Mayıs']
['M', 'a', 'y', 'i', 's']
>>> 'ı'.upper().lower()
'i'
How it should be, given the locale is correct:
>>> [char for char in 'Mayıs']
['M', 'a', 'y', 'ı', 's']
>>> 'ı'.upper().lower()
'ı'
and
>>> 'i'.upper()
'İ'
>>> 'ı'.upper()
'I'
I tried locale.setlocale(locale.LC_ALL,'Turkish_Turkey.1254')
or even 'ı'.encode('cp857')
but it didn't help.
How do I make python handle these two letters correctly?
回答1:
You should use PyICU
>>> from icu import UnicodeString, Locale
>>> tr = Locale("TR")
>>> s = UnicodeString("i")
>>> print(unicode(s.toUpper(tr)))
İ
>>> s = UnicodeString("I")
>>> print(unicode(s.toLower(tr)))
ı
>>>
回答2:
You can define your own hardcoded function for Turkish character problem.
import re
def tr_upper(self):
self = re.sub(r"i", "İ", self)
self = re.sub(r"ı", "I", self)
self = re.sub(r"ç", "Ç", self)
self = re.sub(r"ş", "Ş", self)
self = re.sub(r"ü", "Ü", self)
self = re.sub(r"ğ", "Ğ", self)
self = self.upper() # for the rest use default upper
return self
def tr_lower(self):
self = re.sub(r"İ", "i", self)
self = re.sub(r"I", "ı", self)
self = re.sub(r"Ç", "ç", self)
self = re.sub(r"Ş", "ş", self)
self = re.sub(r"Ü", "ü", self)
self = re.sub(r"Ğ", "ğ", self)
self = self.lower() # for the rest use default lower
return self
regular upper:
>>>print("ulvido".upper())
ULVIDO
our custom upper:
>>>print(tr_upper("ulvido"))
ULVİDO
if you need this conversion a lot you can make it .py file. for example: save it as trtextstyle.py and import into your projects.
if trtextstyle.py is same directory with your file:
from .trtextstyle import tr_upper, tr_lower
hope this helps.
来源:https://stackoverflow.com/questions/19703106/python-and-turkish-capitalization