Python 3 itertools.islice continue despite UnicodeDecodeError

拜拜、爱过 提交于 2021-01-28 04:01:02

问题


I have a python 3 program that monitors a log file. The log includes, among other things, chat messages written by users. The log is created by a third party application which I cannot change.

Today a user wrote "텋��텋��" and it caused the program to crash with the following error:

future: <Task finished coro=<updateConsoleLog() done, defined at /usr/local/src/bserver/logmonitor.py:48> exception=UnicodeDecodeError('utf-8',...
say "\xed\xa0\xbd\xed\xb1\x8c"\r\n', 7623, 7624, 'invalid continuation byte')>
Traceback (most recent call last):
File "/usr/lib/python3.4/asyncio/tasks.py", line 238, in _step
result = next(coro)
File "/usr/local/src/bserver/logmonitor.py", line 50, in updateConsoleLog
server_events = self.console.getUpdate()
File "/usr/local/src/bserver/console.py", line 79, in getUpdate
return self.read()
File "/usr/local/src/bserver/console.py", line 90, in read
for line in itertools.islice(log_file, log_no, None):
File "/usr/lib/python3.4/codecs.py", line 319, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 7623: invalid continuation byte
ERROR:asyncio:Task exception was never retrieved

Using 'file -i log.file' I determined that the log file is us-ascii. This shouldn't be and issue as ascii is a subset of utf-8 (as far as I know).

Since this happens rarely and I don't mind losing whatever it is that this user typed, is it possible for me to ignore this line or the particular characters that can't be decoded and just keep on reading the rest of the file?

I considered using try: ... except UnicodeDecodeError as ..., but this would mean I can't read anything in the log file after the error.

Code

def read(self):
    log_no = self.last_log_no
    log_file = open(self.path, 'r')
    server_events = []
    starting_log_no = log_no
    for line in itertools.islice(log_file, log_no, None): //ERROR
        server_events.append(line)
        print(line.replace('\n', '').replace('\r', ''))

        log_no += 1
        self.last_log_no = log_no
    if (starting_log_no < log_no):
        return server_events
    return False

Any help or advise would be appreciated!


回答1:


The byte string \xed\xa0\xbd\xed\xb1\x8c is not utf-8 valid. Neither is it us-ascii, since us-ascii can only be 7-bits long; i.e. \x8c is greater than 127.

Instead of ignoring the UnicodeDecodeError, try opening the file with an encoding that supports all 8-bits of a byte (e.g. latin-1):

log_file = open(self.path, 'r' encoding='latin-1')


来源:https://stackoverflow.com/questions/33783653/python-3-itertools-islice-continue-despite-unicodedecodeerror

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!