问题
I have a unicode string like "𝖙𝖍𝖚𝖌 𝖑𝖎𝖋𝖊" and would like to convert it to the ASCII form "thug life".
I know I can achieve this in Python by
import unidecode
print(unidecode.unidecode('𝖙𝖍𝖚𝖌 𝖑𝖎𝖋𝖊'))
// thug life
However, this would asciify also other unicode characters (such as Chinese/Japanese characters, emojis, accented characters, etc.), which I want to preserve.
Is there a way to detect these type of "artistic" unicode characters?
Some more examples:
𝓽𝓱𝓾𝓰 𝓵𝓲𝓯𝓮
𝓉𝒽𝓊𝑔 𝓁𝒾𝒻𝑒
𝕥𝕙𝕦𝕘 𝕝𝕚𝕗𝕖
thug life
Thanks for your help!
回答1:
import unicodedata
strings = [
'𝖙𝖍𝖚𝖌 𝖑𝖎𝖋𝖊',
'𝓽𝓱𝓾𝓰 𝓵𝓲𝓯𝓮',
'𝓉𝒽𝓊𝑔 𝓁𝒾𝒻𝑒',
'𝕥𝕙𝕦𝕘 𝕝𝕚𝕗𝕖',
'thug life']
for x in strings:
print(unicodedata.normalize( 'NFKC', x), x)
Output: .\62803325.py
thug life 𝖙𝖍𝖚𝖌 𝖑𝖎𝖋𝖊 thug life 𝓽𝓱𝓾𝓰 𝓵𝓲𝓯𝓮 thug life 𝓉𝒽𝓊𝑔 𝓁𝒾𝒻𝑒 thug life 𝕥𝕙𝕦𝕘 𝕝𝕚𝕗𝕖 thug life thug life
Resources:
unicodedata
— Unicode Database- Normalization forms for Unicode text
来源:https://stackoverflow.com/questions/62803325/convert-fancy-artistic-unicode-text-to-ascii