How do you convert a Unicode string (containing extra characters like £ $, etc.) into a Python string?
file contain unicode-esaped string
\"message\": \"\\u0410\\u0432\\u0442\\u043e\\u0437\\u0430\\u0446\\u0438\\u044f .....\",
for me
f = open("56ad62-json.log", encoding="utf-8")
qq=f.readline()
print(qq)
{"log":\"message\": \"\\u0410\\u0432\\u0442\\u043e\\u0440\\u0438\\u0437\\u0430\\u0446\\u0438\\u044f \\u043f\\u043e\\u043b\\u044c\\u0437\\u043e\\u0432\\u0430\\u0442\\u0435\\u043b\\u044f\"}
(qq.encode().decode("unicode-escape").encode().decode("unicode-escape"))
# '{"log":"message": "Авторизация пользователя"}\n'
There is a library that can help with Unicode issues called ftfy. Has made my life easier.
Example 1
import ftfy
print(ftfy.fix_text('ünicode'))
output -->
ünicode
Example 2 - UTF-8
import ftfy
print(ftfy.fix_text('\xe2\x80\xa2'))
output -->
•
Example 3 - Unicode code point
import ftfy
print(ftfy.fix_text(u'\u2026'))
output -->
…
https://ftfy.readthedocs.io/en/latest/
pip install ftfy
https://pypi.org/project/ftfy/
Here is an example:
>>> u = u'€€€'
>>> s = u.encode('utf8')
>>> s
'\xe2\x82\xac\xe2\x82\xac\xe2\x82\xac'
No answere worked for my case, where I had a string variable containing unicode chars, and no encode-decode explained here did the work.
If I do in a Terminal
echo "no me llama mucho la atenci\u00f3n"
or
python3
>>> print("no me llama mucho la atenci\u00f3n")
The output is correct:
output: no me llama mucho la atención
But working with scripts loading this string variable didn't work.
This is what worked on my case, in case helps anybody:
string_to_convert = "no me llama mucho la atenci\u00f3n"
print(json.dumps(json.loads(r'"%s"' % string_to_convert), ensure_ascii=False))
output: no me llama mucho la atención