问题
i have two strings
eng = "Clash of Clans – Android Apps on Google Play"
rus = "Castle Clash: Новая Эра - Android Apps on Google Play"
and now i want to check whether string is in English or not by using Python 3
.
I have read this Stackoverflow answer here and it does not help me as its for Python 2.x
solution but in comments some one mention that use
string.encode('ascii')
to make it work in Python 3.x
but my problem is, in both cases it raises same UnicodeEncodeError
exception!
Screenshot:
so now i am stuck here and cant figure out how to make it work!
kindly guide me or i have to use another method to determine if String
is in English
or not!
Thanks
回答1:
As with Salvador Dali's answer you linked to, you must use a try-catch block to check for an error in encoding.
# -*- coding: utf-8 -*-
def isEnglish(s):
try:
s.encode('ascii')
except UnicodeEncodeError:
return False
else:
return True
Just to note though, when I copy and pasted your eng
and rus
strings to try them, they both came up as False
. Retyping the English one returned True
, so I'm not sure what's up with that.
回答2:
Your English string really isn't true ASCII, it contains the character U+2013 - EN DASH. This looks very similar to the ASCII dash U+002d
but it is different.
If this is the only character you need to worry about, you can do a simple replacement to make it work:
>>> eng.replace('\u2013', '-').encode('ascii')
b'Clash of Clans - Android Apps on Google Play'
回答3:
You can use the isascii() method:
>>> rus.isascii()
False
来源:https://stackoverflow.com/questions/33004065/how-to-check-if-string-is-100-ascii-in-python-3