问题
I have the following dataset
'Fʀɪᴇɴᴅ',
'ᴍᴏᴍ',
'ᴍᴀᴋᴇs',
'ʜᴏᴜʀʟʏ',
'ᴛʜᴇ',
'ᴄᴏᴍᴘᴜᴛᴇʀ',
'ʙᴇᴇɴ',
'ᴏᴜᴛ',
'ᴀ',
'ᴊᴏʙ',
'ғᴏʀ',
'ᴍᴏɴᴛʜs',
'ʙᴜᴛ',
'ʟᴀsᴛ',
'ᴍᴏɴᴛʜ',
'ʜᴇʀ',
'ᴄʜᴇᴄᴋ',
'ᴊᴜsᴛ',
'ᴡᴏʀᴋɪɴɢ',
'ғᴇᴡ',
'ʜᴏᴜʀs',
'sᴏᴜʀᴄᴇ',
I want then into ASCII format using Python script for example:
Fʀɪᴇɴᴅ - FRIEND
ᴍᴏᴍ - MOM
I have tried encoding decoding but that doesn't work i also have tried this solution. but that doesn't solve my problem.
回答1:
Python doesn't provide a way to directly convert small caps characters to their ASCII equivalents. However it's possible to do this using str.translate.
To use str.translate
we need to create a mapping of small caps characters' ordinal values to ASCII characters.
To get the ordinal values, we can construct the name of each character, then get the character from the unicodedata database and call ord on it. Note that there is no small caps 'X' character, and in Python versions before 3.7 small caps 'Q' is not present.
>>> from string import ascii_uppercase
>>> import unicodedata as ud
>>> # Filter out unsupported characters
>>> # Python < 3.7
>>> letters = (x for x in ascii_uppercase if x not in ('Q', 'X'))
>>> # Python >= 3.7
>>> letters = (x for x in ascii_uppercase if x != 'X')
>>> mapping = {ord(ud.lookup('LATIN LETTER SMALL CAPITAL ' + x)): x for x in letters}
Once we have the mapping we can use it to make a translation table for str.translate
, using str.maketrans, then perform the conversions.
>>> # Make as translation table
>>> tt = str.maketrans(mapping)
>>> # Use the table to "translate" strings to their ASCII equivalent.
>>> s = 'ᴍᴏɴᴛʜ'
>>> s.translate(tt)
'MONTH'
来源:https://stackoverflow.com/questions/55717223/convert-unicode-small-capitals-to-their-ascii-equivalents