App crashes when it takes too long to reply in a ZMQ REQ/REP pattern

问题

I am writing a plugin that interfaces with a desktop application through a ZeroMQ REQ/REP request-reply communication archetype. I can currently receive a request, but the application seemingly crashes if a reply is not sent quick enough.

I receive the request on a spawned thread and put it in a queue. This queue is processed in another thread, in which the processing function is invoked by the application periodically.

The message is correctly being received and processed, but the response cannot be sent until the next iteration of the function, as I cannot get the data from the application until then.

When this function is conditioned to send the response on the next iteration, the application will crash. However, if I send fake data as the response soon after receiving the request, in the first iteration, the application will not crash.

Constructing the socket

    zmq::socket_t socket(m_context, ZMQ_REP);
    socket.bind("tcp://*:" + std::to_string(port));

Receiving the message in the spawned thread

void ZMQReceiverV2::receiveRequests() {
    nInfo(*m_logger) << "Preparing to receive requests";
    while (m_isReceiving) {
        zmq::message_t zmq_msg;
        bool ok = m_respSocket.recv(&zmq_msg, ZMQ_NOBLOCK);
        if (ok) {
            // msg_str will be a binary string
            std::string msg_str;
            msg_str.assign(static_cast<char *>(zmq_msg.data()), zmq_msg.size());
            nInfo(*m_logger) << "Received the message: " << msg_str;
            std::pair<std::string, std::string> pair("", msg_str);
            // adding to message queue
            m_mutex.lock();
            m_messages.push(pair);
            m_mutex.unlock();
        }
        std::this_thread::sleep_for(std::chrono::milliseconds(100));
    }
    nInfo(*m_logger) << "Done receiving requests";
}

Processing function on seperate thread


void ZMQReceiverV2::exportFrameAvailable()
    // checking messages
    // if the queue is not empty
    m_mutex.lock();
    if (!m_messages.empty()) {
        nInfo(*m_logger) << "Reading message in queue";
        smart_target::SMARTTargetCreateRequest id_msg;
        std::pair<std::string, std::string> pair = m_messages.front();
        std::string topic   = pair.first;
        std::string msg_str = pair.second;
        processMsg(msg_str);
        // removing just read message
        m_messages.pop(); 
        //m_respSocket.send(zmq::message_t()); wont crash if I reply here in this invocation
    }
    m_mutex.unlock();

    // sending back the ID that has just been made, for it to be mapped
    if (timeToSendReply()) {
        sendReply();  // will crash, if I wait for this to be exectued on next invocation
    }
}

My research shows that there is no time limit for the response to be sent, so this, seeming to be, timing issue, is strange.

Is there something that I am missing that will let me send the response on the second iteration of the processing function?

Revision 1:

I have edited my code, so that the responding socket only ever exists on one thread. Since I need to get information from the processing function to send, I created another queue, which is checked in the revised the function running on its own thread.

void ZMQReceiverV2::receiveRequests() {
    zmq::socket_t socket = setupBindSocket(ZMQ_REP, 5557, "responder");
    nInfo(*m_logger) << "Preparing to receive requests";
    while (m_isReceiving) {
        zmq::message_t zmq_msg;
        bool ok = socket.recv(&zmq_msg, ZMQ_NOBLOCK);
        if (ok) {
            // does not crash if I call send helper here
            // msg_str will be a binary string
            std::string msg_str;
            msg_str.assign(static_cast<char *>(zmq_msg.data()), zmq_msg.size());
            NLogger::nInfo(*m_logger) << "Received the message: " << msg_str;
            std::pair<std::string, std::string> pair("", msg_str);
            // adding to message queue
            m_mutex.lock();
            m_messages.push(pair);
            m_mutex.unlock();
        }
        std::this_thread::sleep_for(std::chrono::milliseconds(100));
        if (!sendQueue.empty()) {
            sendEntityCreationMessage(socket, sendQueue.front());
            sendQueue.pop();
        }
    }
    nInfo(*m_logger) << "Done receiving requests";
    socket.close();
}

The function sendEntityCreationMessage() is a helper function that ultimately calls socket.send().

void ZMQReceiverV2::sendEntityCreationMessage(zmq::socket_t &socket, NUniqueID id) {
    socket.send(zmq::message_t());
}

This code seems to be following the thread safety guidelines for sockets. Any suggestions?

回答1:

Q : "Is there something that I am missing"

Yes,
the ZeroMQ evangelisation, called a Zen-of-Zero, since ever promotes never try to share a Socket-instance, never try to block and never expect the world to act as one wishes.

This said, avoid touching the same Socket-instance from any non-local thread, except the one that has instantiated and owns the socket.

Last, but not least, the REQ/REP-Scalable Formal Communication Pattern Archetype is prone to fall into a deadlock, as a mandatory two-step dance must be obeyed - where one must keep the alternating sequence of calling .send()-.recv()-.send()-.recv()-.send()-...-methods, otherwise the principally distributed-system tandem of Finite State Automata (FSA) will unsalvageably end up in a mutual self-deadlock state of the dFSA.

_{In case one is planning to professionally build on ZeroMQ, the best next step is to re-read the fabulous Pieter HINTJENS' book "Code Connected: Volume 1". A piece of a hard read, yet definitely worth one's time, sweat, tears & efforts put in.}

来源：https://stackoverflow.com/questions/64700007/app-crashes-when-it-takes-too-long-to-reply-in-a-zmq-req-rep-pattern

标签

c++

zeromq