问题
I have a python 3 program that monitors a log file. The log includes, among other things, chat messages written by users. The log is created by a third party application which I cannot change.
Today a user wrote "텋��텋��" and it caused the program to crash with the following error:
future: <Task finished coro=<updateConsoleLog() done, defined at /usr/local/src/bserver/logmonitor.py:48> exception=UnicodeDecodeError('utf-8',...
say "\xed\xa0\xbd\xed\xb1\x8c"\r\n', 7623, 7624, 'invalid continuation byte')>
Traceback (most recent call last):
File "/usr/lib/python3.4/asyncio/tasks.py", line 238, in _step
result = next(coro)
File "/usr/local/src/bserver/logmonitor.py", line 50, in updateConsoleLog
server_events = self.console.getUpdate()
File "/usr/local/src/bserver/console.py", line 79, in getUpdate
return self.read()
File "/usr/local/src/bserver/console.py", line 90, in read
for line in itertools.islice(log_file, log_no, None):
File "/usr/lib/python3.4/codecs.py", line 319, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 7623: invalid continuation byte
ERROR:asyncio:Task exception was never retrieved
Using 'file -i log.file' I determined that the log file is us-ascii. This shouldn't be and issue as ascii is a subset of utf-8 (as far as I know).
Since this happens rarely and I don't mind losing whatever it is that this user typed, is it possible for me to ignore this line or the particular characters that can't be decoded and just keep on reading the rest of the file?
I considered using try: ... except UnicodeDecodeError as ...
, but this would mean I can't read anything in the log file after the error.
Code
def read(self):
log_no = self.last_log_no
log_file = open(self.path, 'r')
server_events = []
starting_log_no = log_no
for line in itertools.islice(log_file, log_no, None): //ERROR
server_events.append(line)
print(line.replace('\n', '').replace('\r', ''))
log_no += 1
self.last_log_no = log_no
if (starting_log_no < log_no):
return server_events
return False
Any help or advise would be appreciated!
回答1:
The byte string \xed\xa0\xbd\xed\xb1\x8c
is not utf-8
valid. Neither is it us-ascii
, since us-ascii
can only be 7-bits long; i.e. \x8c
is greater than 127.
Instead of ignoring the UnicodeDecodeError
, try opening the file with an encoding that supports all 8-bits of a byte (e.g. latin-1
):
log_file = open(self.path, 'r' encoding='latin-1')
来源:https://stackoverflow.com/questions/33783653/python-3-itertools-islice-continue-despite-unicodedecodeerror