The right way to check of a string has hebrew chars

做~自己de王妃 提交于 2019-12-06 14:08:23

问题


The Hebrew language has unicode representation between 1424 and 1514 (or hex 0590 to 05EA).

I'm looking for the right, most efficient and most pythonic way to achieve this.

First I came up with this:

for c in s:
    if ord(c) >= 1424 and ord(c) <= 1514:
        return True
return False

Then I came with a more elegent implementation:

return any(map(lambda c: (ord(c) >= 1424 and ord(c) <= 1514), s))

And maybe:

return any([(ord(c) >= 1424 and ord(c) <= 1514) for c in s])

Which of these are the best? Or i should do it differently?


回答1:


You could do:

# Python 3.
return any("\u0590" <= c <= "\u05EA" for c in s)
# Python 2.
return any(u"\u0590" <= c <= u"\u05EA" for c in s)



回答2:


Your basic options are:

  1. Match against a regex containing the range of characters; or
  2. Iterate over the string, testing for membership of the character in a string or set containing all of your target characters, and break if you find a match.

Only actual testing can show which is going to be faster.




回答3:


Its simple to check the first character with unidcodedata:

import unicodedata

def is_greek(term):
    return 'GREEK' in unicodedata.name(term.strip()[0])


def is_hebrew(term):
    return 'HEBREW' in unicodedata.name(term.strip()[0])


来源:https://stackoverflow.com/questions/10664254/the-right-way-to-check-of-a-string-has-hebrew-chars

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!