Asyncio and pyzmq - 'utf-8' codec can't decode byte 0xff in position 0

让人想犯罪 __ 提交于 2019-12-13 03:45:30

问题


I have a asyncio server, which is an example from the TCP Doc. However I'm connecting to it using pyzmq and when the reader on the server tries to read I get a decode error. Any hint is highly appreciated. I've already tried encoding to utf-8 first, didn't help.

Server: (Python 3.6)

import asyncio

async def handle_echo(reader, writer):
    data = await reader.read(100)
    print(data)
    message = data.decode()


loop = asyncio.get_event_loop()
coro = asyncio.start_server(handle_echo, '127.0.0.1', 5555, loop=loop)
server = loop.run_until_complete(coro)
loop.run_forever()

Client: (Python 2.7)

import zmq
context = zmq.Context()
socket = context.socket(zmq.REQ)
socket.connect ("tcp://localhost:%s" % 5555)
socket.send("test")

Full Trace:

    future: <Task finished coro=<handle_echo() done, defined at "E:\Projects\AsyncIOserver.py:3> exception=UnicodeDecodeError('utf-8', b'\xff\x00\x00\x00\x00\x00\x00\x00\x01\x7f', 0, 1, 'invalid start byte')>
Traceback (most recent call last):
  File "E:\Projects\AsyncIOserver.py", line 6, in handle_echo
    message = data.decode()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

回答1:


Zeromq uses the ZMTP protocol. It is a binary protocol so you won't be able to decode it directly.

If you're curious about it, check the ZMTP frames using wireshark and the ZMTP plugin:

You can see that the bytes you got actually corresponds to the greeting message signature.


In order to receive the messages from a ZMQ socket in asyncio, use a dedicated project like aiozmq:

import aiozmq
import asyncio

async def main(port=5555):
    bind = "tcp://*:%s" % port
    rep = await aiozmq.create_zmq_stream(aiozmq.zmq.REP, bind=bind)
    message, = await rep.read()
    print(message.decode())
    rep.write([message])

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())
    loop.close()



回答2:


The byte ff is the first byte of a little-endian UTF-16 BOM, it has no place in a UTF-8 stream, where the maximum number of 1-bits at the start of a codepoint is four.

See an earlier answer of mine for more detail on the UTF-8 encoding.

As to fixing it, you'll need to receive what was sent. That will involve either fixing the transmission side to do UTF-8, or the reception side to do UTF-16.

You may want to look into the differences between strings in Python 2 and 3, this may well be what's causing your issue (see here).



来源:https://stackoverflow.com/questions/48492913/asyncio-and-pyzmq-utf-8-codec-cant-decode-byte-0xff-in-position-0

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!