Regular expression that finds and replaces non-ascii characters with Python

前端 未结 7 2187
無奈伤痛
無奈伤痛 2020-12-03 19:40

I need to change some characters that are not ASCII to \'_\'. For example,

Tannh‰user -> Tannh_user
  • If I use regular expression with Python, how
相关标签:
7条回答
  • 2020-12-03 20:28

    To answer the question

    '[\u0080-\uFFFF]'
    

    will match any UTF-8 character not in the range of the first 128 characters

    re.sub('[\u0080-\uFFFF]+', '_', x)
    

    will replace any sequence of consecutive nonascii characters with an underscore

    0 讨论(0)
提交回复
热议问题