I\'m trying to get a Python 3 program to do some manipulations with a text file filled with information. However, when trying to read the file I get the following error:
Just to add in case file = open(filename, encoding="utf8")
does not work
try file = open(filename, errors='ignore')
TLDR? Try: file = open(filename, encoding='cp437)
Why? When one use:
file = open(filename)
text = file.read()
Python assumes the file uses the same codepage as current environment (cp1252 in case of the opening post) and tries to decode it to its own default UTF-8. If the file contains characters of values not defined in this codepage (like 0x90) we get UnicodeDecodeError. Sometimes we don't know the encoding of the file, sometimes the file's encoding may be unhandled by Python (like e.g. cp790), sometimes the file can contain mixed encodings.
If such characters are unneeded, one may decide to replace them by question marks, with:
file = open(filename, errors='replace')
Another workaround is to use:
file = open(filename, errors='ignore')
The characters are then left intact, but other errors will be masked too.
Quite good solution is to specify the encoding, yet not any encoding (like cp1252), but the one which has ALL characters defined (like cp437):
file = open(filename, encoding='cp437')
Codepage 437 is the original DOS encoding. All codes are defined, so there are no errors while reading the file, no errors are masked out, the characters are preserved (not quite left intact but still distinguishable).
As an extension to @LennartRegebro's answer:
If you can't tell what encoding your file uses and the solution above does not work (it's not utf8
) and you found yourself merely guessing - there are online tools that you could use to identify what encoding that is. They aren't perfect but usually work just fine. After you figure out the encoding you should be able to use solution above.
EDIT: (Copied from comment)
A quite popular text editor Sublime Text
has a command to display encoding if it has been set...
View
-> Show Console
(or Ctrl+`)
view.encoding()
and hope for the best (I was unable to get anything but Undefined
but maybe you will have better luck...)
For those working in Anaconda in Windows, I had the same problem. Notepad++ help me to solve it.
Open the file in Notepad++. In the bottom right it will tell you the current file encoding. In the top menu, next to "View" locate "Encoding". In "Encoding" go to "character sets" and there with patiente look for the enconding that you need. In my case the encoding "Windows-1252" was found under "Western European"
Stop wasting your time, just add the following encoding="cp437"
and errors='ignore'
to your code in both read and write:
open('filename.csv', encoding="cp437", errors='ignore')
open(file_name, 'w', newline='', encoding="cp437", errors='ignore')
Godspeed
Alternatively if you don't need to decode the file, such as uploading the file to a website, open(filename, 'rb')
where r = reading, b = binary