UnicodeDecodeError, invalid continuation byte

前端未结

关注

 10  2006

Why is the below item failing? Why does it succeed with "latin-1" codec?

o = "a test of \\xe9 char" #I want this to remain a string as thi


                      
              相关标签:


      
      
        
          10条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  轮回少年        
                
              
                            
                2020-11-22 08:54
              
            
            
                                                                       
Because UTF-8 is multibyte and there is no char corresponding to your combination of \xe9 plus following space.

Why should it succeed in both utf-8 and latin-1?

Here how the same sentence should be in utf-8:

>>> o.decode('latin-1').encode("utf-8")
'a test of \xc3\xa9 char'

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  深忆病人        
                
              
                            
                2020-11-22 08:56
              
            
            
                                                                       
Use this, If it shows the error of UTF-8 

pd.read_csv('File_name.csv',encoding='latin-1')

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  伪装坚强ぢ        
                
              
                            
                2020-11-22 08:57
              
            
            
                                                                       
utf-8 code error usually comes when the range of numeric values exceeding 0 to 127.

the reason to raise this exception is:

1)If the code point is < 128, each byte is the same as the value of the code point.
2)If the code point is 128 or greater, the Unicode string can’t be represented in this encoding. (Python raises a UnicodeEncodeError exception in this case.)

In order to to overcome this we have a set of encodings, the most widely used is "Latin-1, also known as ISO-8859-1" 

So ISO-8859-1 Unicode points 0–255 are identical to the Latin-1 values, so converting to this encoding simply requires converting code points to byte values; if a code point larger than 255 is encountered, the string can’t be encoded into Latin-1

when this exception occurs when you are trying to load a data set ,try using this format

df=pd.read_csv("top50.csv",encoding='ISO-8859-1')


Add encoding technique at the end of the syntax which then accepts to load the data set.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  忘掉有多难        
                
              
                            
                2020-11-22 08:58
              
            
            
                                                                       
It is invalid UTF-8.  That character is the e-acute character in ISO-Latin1, which is why it succeeds with that codeset.

If you don't know the codeset you're receiving strings in, you're in a bit of trouble.  It would be best if a single codeset (hopefully UTF-8) would be chosen for your protocol/application and then you'd just reject ones that didn't decode.

If you can't do that, you'll need heuristics.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  伪装坚强ぢ        
                
              
                            
                2020-11-22 08:59
              
            
            
                                                                       
This happened to me also, while i was reading text containing Hebrew from a .txt file. 

I clicked:  file -> save as and I saved this file as a UTF-8 encoding
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  天涯浪人        
                
              
                            
                2020-11-22 09:02
              
            
            
                                                                       
In this case, I tried to execute a .py which active a path/file.sql.

My solution was to modify the codification of the file.sql to "UTF-8 without BOM" and it works!

You can do it with Notepad++.

i will leave a part of my code. 

/Code/

con=psycopg2.connect(host = sys.argv[1],
port = sys.argv[2],dbname = sys.argv[3],user = sys.argv[4], password = sys.argv[5])

cursor = con.cursor()
sqlfile = open(path, 'r')
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     1
2
下一页
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复