Reading non-ASCII characters from a text file

前端 未结 3 1672
孤独总比滥情好
孤独总比滥情好 2021-01-06 00:03

I\'m using python 2.7. I\'ve tried many things like codecs but didn\'t work. How can I fix this.

myfile.txt

wörd

My code

         


        
相关标签:
3条回答
  • 2021-01-06 00:40

    It's the terminal encoding. Try to configure your terminal with the same encoding you are using in your file. I recomend you to use UTF-8.

    By the way, is a good practice to decode-encode all your inputs-outputs to avoid problems:

    f = open('test.txt','r')    
    for line in f:
        l = unicode(line, encoding='utf-8')# decode the input                                                                                  
        print l.encode('utf-8') # encode the output                                                                                            
    f.close()
    
    0 讨论(0)
  • 2021-01-06 00:43
    1. First of all - detect the file's encoding
    
      from chardet import detect
      encoding = lambda x: detect(x)['encoding']
      print encoding(line)
    
    1. then - convert it to unicode or your default encoding str:
    
      n_line=unicode(line,encoding(line),errors='ignore')
      print n_line
      print n_line.encode('utf8')
    
    0 讨论(0)
  • 2021-01-06 00:46
    import codecs
    #open it with utf-8 encoding 
    f=codecs.open("myfile.txt","r",encoding='utf-8')
    #read the file to unicode string
    sfile=f.read()
    
    #check the encoding type
    print type(file) #it's unicode
    
    #unicode should be encoded to standard string to display it properly
    print sfile.encode('utf-8')
    #check the type of encoded string
    
    print type(sfile.encode('utf-8'))
    
    0 讨论(0)
提交回复
热议问题