Python - read text file with weird utf-16 format

前端未结

关注

 4  775

走了就别回头了 2021-01-17 17:00

I\'m trying to read a text file into python, but it seems to use some very strange encoding. I try the usual:

file = open(\'data.txt\',\'r\')

lines = file.


      
      
        
          4条回答        

        
                    
            
            
                         
                
              
              
                
                   无人共我
                                             
                
                
                (楼主)
            
              
              
                2021-01-17 17:22
              

            
            
                        
Looks like UTF-16 to me.

>>> test_utf16 = '0\x00.\x000\x002\x000\x000\x001\x009\x007\x00'
>>> test_utf16.decode('utf-16')
u'0.0200197'


You can work directly off the Unicode strings:

>>> float(test_utf16)
Traceback (most recent call last):
  File "", line 1, in 
ValueError: null byte in argument for float()
>>> float(test_utf16.decode('utf-16'))
0.020019700000000001


Or encode them to something different, if you prefer:

>>> float(test_utf16.decode('utf-16').encode('ascii'))
0.020019700000000001


Note that you need to do this as early as possible in your processing. As your comment noted, split will behave incorrectly on the utf-16 encoded form. The utf-16 representation of the space character ' ' is ' \x00', so split removes the whitespace but leaves the null byte.

The 2.6 and later io library can handle this for you, as can the older codecs library. io handles linefeeds better, so it's preferable if available.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它4个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复