Convert fancy/artistic unicode text to ASCII

放肆的年华 提交于 2021-01-19 08:53:37

问题


I have a unicode string like "𝖙𝖍𝖚𝖌 𝖑𝖎𝖋𝖊" and would like to convert it to the ASCII form "thug life".

I know I can achieve this in Python by

import unidecode
print(unidecode.unidecode('𝖙𝖍𝖚𝖌 𝖑𝖎𝖋𝖊'))
// thug life

However, this would asciify also other unicode characters (such as Chinese/Japanese characters, emojis, accented characters, etc.), which I want to preserve.

Is there a way to detect these type of "artistic" unicode characters?

Some more examples:

𝓽𝓱𝓾𝓰 𝓵𝓲𝓯𝓮

𝓉𝒽𝓊𝑔 𝓁𝒾𝒻𝑒

𝕥𝕙𝕦𝕘 𝕝𝕚𝕗𝕖

thug life

Thanks for your help!


回答1:


import unicodedata
strings = [
  '𝖙𝖍𝖚𝖌 𝖑𝖎𝖋𝖊',
  '𝓽𝓱𝓾𝓰 𝓵𝓲𝓯𝓮',
  '𝓉𝒽𝓊𝑔 𝓁𝒾𝒻𝑒',
  '𝕥𝕙𝕦𝕘 𝕝𝕚𝕗𝕖',
  'thug life']
for x in strings:
  print(unicodedata.normalize( 'NFKC', x), x)

Output: .\62803325.py

thug life 𝖙𝖍𝖚𝖌 𝖑𝖎𝖋𝖊
thug life 𝓽𝓱𝓾𝓰 𝓵𝓲𝓯𝓮
thug life 𝓉𝒽𝓊𝑔 𝓁𝒾𝒻𝑒
thug life 𝕥𝕙𝕦𝕘 𝕝𝕚𝕗𝕖
thug life thug life

Resources:

  • unicodedata — Unicode Database
  • Normalization forms for Unicode text


来源:https://stackoverflow.com/questions/62803325/convert-fancy-artistic-unicode-text-to-ascii

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!