Reading non-ASCII characters from a text file

前端未结

关注

 3  1672

I\'m using python 2.7. I\'ve tried many things like codecs but didn\'t work. How can I fix this.

myfile.txt

wörd

My code

相关标签:

3条回答

心在旅途

2021-01-06 00:40

It's the terminal encoding. Try to configure your terminal with the same encoding you are using in your file. I recomend you to use UTF-8.

By the way, is a good practice to decode-encode all your inputs-outputs to avoid problems:

f = open('test.txt','r')    
for line in f:
    l = unicode(line, encoding='utf-8')# decode the input                                                                                  
    print l.encode('utf-8') # encode the output                                                                                            
f.close()

0 讨论(0)

时光取名叫无心

2021-01-06 00:43

First of all - detect the file's encoding


  from chardet import detect
  encoding = lambda x: detect(x)['encoding']
  print encoding(line)

then - convert it to unicode or your default encoding str:


  n_line=unicode(line,encoding(line),errors='ignore')
  print n_line
  print n_line.encode('utf8')

0 讨论(0)

野趣味

2021-01-06 00:46

import codecs
#open it with utf-8 encoding 
f=codecs.open("myfile.txt","r",encoding='utf-8')
#read the file to unicode string
sfile=f.read()

#check the encoding type
print type(file) #it's unicode

#unicode should be encoded to standard string to display it properly
print sfile.encode('utf-8')
#check the type of encoded string

print type(sfile.encode('utf-8'))

0 讨论(0)