Checking for illegal surrogates in Python 3 strings

大城市里の小女人 提交于 2019-12-20 02:37:14

问题


Specifically in Python 3.3 and above, is it sufficient to check for orphan surrogates by using the simple match:

re.search(r'[\uD800-\uDFFF]', s)

Based on the assumption that all legal surrogates would have been represented as astral code points and thus would not match, leaving out the illegal surrogates, or is there caveats and edge cases one needs to be aware of?


回答1:


Yes, that's correct. Code units 0xD800–0xDFFF don't represent valid characters in wide Unicode strings, and in Python 3.3+ (following PEP 393) all Unicode strings are effectively wide.



来源:https://stackoverflow.com/questions/32563944/checking-for-illegal-surrogates-in-python-3-strings

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!