UnicodeDecodeError, invalid continuation byte

前端 未结 10 1962
忘掉有多难
忘掉有多难 2020-11-22 08:25

Why is the below item failing? Why does it succeed with "latin-1" codec?

o = "a test of \\xe9 char" #I want this to remain a string as thi         


        
相关标签:
10条回答
  • 2020-11-22 08:54

    Because UTF-8 is multibyte and there is no char corresponding to your combination of \xe9 plus following space.

    Why should it succeed in both utf-8 and latin-1?

    Here how the same sentence should be in utf-8:

    >>> o.decode('latin-1').encode("utf-8")
    'a test of \xc3\xa9 char'
    
    0 讨论(0)
  • 2020-11-22 08:56

    Use this, If it shows the error of UTF-8

    pd.read_csv('File_name.csv',encoding='latin-1')
    
    0 讨论(0)
  • 2020-11-22 08:57

    utf-8 code error usually comes when the range of numeric values exceeding 0 to 127.

    the reason to raise this exception is:

    1)If the code point is < 128, each byte is the same as the value of the code point. 2)If the code point is 128 or greater, the Unicode string can’t be represented in this encoding. (Python raises a UnicodeEncodeError exception in this case.)

    In order to to overcome this we have a set of encodings, the most widely used is "Latin-1, also known as ISO-8859-1"

    So ISO-8859-1 Unicode points 0–255 are identical to the Latin-1 values, so converting to this encoding simply requires converting code points to byte values; if a code point larger than 255 is encountered, the string can’t be encoded into Latin-1

    when this exception occurs when you are trying to load a data set ,try using this format

    df=pd.read_csv("top50.csv",encoding='ISO-8859-1')
    

    Add encoding technique at the end of the syntax which then accepts to load the data set.

    0 讨论(0)
  • 2020-11-22 08:58

    It is invalid UTF-8. That character is the e-acute character in ISO-Latin1, which is why it succeeds with that codeset.

    If you don't know the codeset you're receiving strings in, you're in a bit of trouble. It would be best if a single codeset (hopefully UTF-8) would be chosen for your protocol/application and then you'd just reject ones that didn't decode.

    If you can't do that, you'll need heuristics.

    0 讨论(0)
  • 2020-11-22 08:59

    This happened to me also, while i was reading text containing Hebrew from a .txt file.

    I clicked: file -> save as and I saved this file as a UTF-8 encoding

    0 讨论(0)
  • 2020-11-22 09:02

    In this case, I tried to execute a .py which active a path/file.sql.

    My solution was to modify the codification of the file.sql to "UTF-8 without BOM" and it works!

    You can do it with Notepad++.

    i will leave a part of my code.

    /Code/

    con=psycopg2.connect(host = sys.argv[1], port = sys.argv[2],dbname = sys.argv[3],user = sys.argv[4], password = sys.argv[5])

    cursor = con.cursor() sqlfile = open(path, 'r')

    0 讨论(0)
提交回复
热议问题