Checking for illegal surrogates in Python 3 strings

前端 未结 1 575
夕颜
夕颜 2021-01-21 07:53

Specifically in Python 3.3 and above, is it sufficient to check for orphan surrogates by using the simple match:

re.search(r\'[\\uD800-\\uDFFF]\', s)


        
相关标签:
1条回答
  • 2021-01-21 08:44

    Yes, that's correct. Code units 0xD800–0xDFFF don't represent valid characters in wide Unicode strings, and in Python 3.3+ (following PEP 393) all Unicode strings are effectively wide.

    0 讨论(0)
提交回复
热议问题