Convert a Unicode string to a string in Python (containing extra symbols)

前端 未结 10 2665
夕颜
夕颜 2020-11-22 01:20

How do you convert a Unicode string (containing extra characters like £ $, etc.) into a Python string?

相关标签:
10条回答
  • 2020-11-22 02:11

    file contain unicode-esaped string

    \"message\": \"\\u0410\\u0432\\u0442\\u043e\\u0437\\u0430\\u0446\\u0438\\u044f .....\",
    

    for me

     f = open("56ad62-json.log", encoding="utf-8")
     qq=f.readline() 
    
     print(qq)                          
     {"log":\"message\": \"\\u0410\\u0432\\u0442\\u043e\\u0440\\u0438\\u0437\\u0430\\u0446\\u0438\\u044f \\u043f\\u043e\\u043b\\u044c\\u0437\\u043e\\u0432\\u0430\\u0442\\u0435\\u043b\\u044f\"}
    
    (qq.encode().decode("unicode-escape").encode().decode("unicode-escape")) 
    # '{"log":"message": "Авторизация пользователя"}\n'
    
    0 讨论(0)
  • 2020-11-22 02:13

    There is a library that can help with Unicode issues called ftfy. Has made my life easier.

    Example 1

    import ftfy
    print(ftfy.fix_text('ünicode'))
    
    output -->
    ünicode
    

    Example 2 - UTF-8

    import ftfy
    print(ftfy.fix_text('\xe2\x80\xa2'))
    
    output -->
    •
    

    Example 3 - Unicode code point

    import ftfy
    print(ftfy.fix_text(u'\u2026'))
    
    output -->
    …
    

    https://ftfy.readthedocs.io/en/latest/

    pip install ftfy

    https://pypi.org/project/ftfy/

    0 讨论(0)
  • 2020-11-22 02:15

    Here is an example:

    >>> u = u'€€€'
    >>> s = u.encode('utf8')
    >>> s
    '\xe2\x82\xac\xe2\x82\xac\xe2\x82\xac'
    
    0 讨论(0)
  • 2020-11-22 02:19

    No answere worked for my case, where I had a string variable containing unicode chars, and no encode-decode explained here did the work.

    If I do in a Terminal

    echo "no me llama mucho la atenci\u00f3n"
    

    or

    python3
    >>> print("no me llama mucho la atenci\u00f3n")
    

    The output is correct:

    output: no me llama mucho la atención
    

    But working with scripts loading this string variable didn't work.

    This is what worked on my case, in case helps anybody:

    string_to_convert = "no me llama mucho la atenci\u00f3n"
    print(json.dumps(json.loads(r'"%s"' % string_to_convert), ensure_ascii=False))
    output: no me llama mucho la atención
    
    0 讨论(0)
提交回复
热议问题