I have the following code that search through files using RE's and if any matches are found it move the file into a different directory.
import os
import gzip
import re
import shutil
def regEx1():
os.chdir("C:/Users/David/myfiles")
files = os.listdir(".")
os.mkdir("C:/Users/David/NewFiles")
regex_txt = input("Please enter the string your are looking for:")
for x in (files):
inputFile = open((x), "r")
content = inputFile.read()
inputFile.close()
regex = re.compile(regex_txt, re.IGNORECASE)
if re.search(regex, content)is not None:
shutil.copy(x, "C:/Users/David/NewFiles")
When I run it i get the following error message:
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
File "C:\Python33\Lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 367: character maps to <undefined>
Please could someone explain why this message appears
In python 3, when you open a file for reading in text mode (r
) it'll decode the contained text to unicode.
Since you didn't specify what encoding to use to read the file, the platform default (from locale.getpreferredencoding
) is being used, and that fails in this case.
You need to either specify an encoding that can decode the file contents, or open the file in binary mode instead (and use b''
bytes patterns for your regular expressions).
See the Python Unicode HOWTO for more information.
I'm not too familiar with python 3x, but the below may work.
inputFile = open((x, encoding="utf8"), "r")
There's a similar question here: Python: Traceback codecs.charmap_decode(input,self.errors,decoding_table)[0]
But you might want to try:
open((x), "r", encoding='UTF8')
Thank you very much for this solution. It helps me for another subject, I used :
exec (open ("DIP6.py").read ())
and I got this error because I have this symbol in a comment of DIP6.py :
# ● en première colonne
It works fine with :
exec (open ("DIP6.py", encoding="utf8").read ())
It also solves a problem with :
print("été") for example
in DIP6.py
I got :
été
in the console.
Thank you :-) .
来源:https://stackoverflow.com/questions/14242486/python-why-am-i-getting-a-unicodedecodeerror