How to check whether a string contains Cyrillic characters?
E.g.
>>> has_cyrillic(\'Hello, world!\')
False
>>> has_cyrillic(\'Приве
regex supports Unicode properties, along with a few short forms.
>>> regex.search(r'\p{IsCyrillic}', 'Hello, world!')
>>> regex.search(r'\p{IsCyrillic}', 'Привет, world!')
<regex.Match object; span=(0, 1), match='П'>
>>> regex.search(r'\p{IsCyrillic}', 'Hello, wёrld!')
<regex.Match object; span=(8, 9), match='ё'>
Suggesting a method, faster than the discussed ones here.
Approach#1:
len("экономия3r4".encode("ascii", "ignore")) > len ("экономия3r4")
246 ns ± 7.76 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Would print "True" if there is a Cyrillic character
Approach#2:
Discussed in earlier post by Max
import re
def has_cyrillic(text):
return bool(re.search('[а-яА-Я]', text))
has_cyrillic("экономия3r4")
929 ns ± 20.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
You can use a regular expression to check if a string contains characters in the а-я, А-Я
range:
import re
def has_cyrillic(text):
return bool(re.search('[а-яА-Я]', text))
Alternatively, you can match the whole Cyrillic script range:
def has_cyrillic(text):
return bool(re.search('[\u0400-\u04FF]', text))
This will also match letters of the extended Cyrillic alphabet (e.g. ё, Є, ў).
You could create a set
containing the cyrillic letters and just check each character of the string:
cyrillic_letters = {....} # fill it with the cyrillic letters
def has_cyrillic(text):
for c in text:
if c in cyrillic_letters:
return True
return False