I\'m experiencing a problem with GCP pubsub where a small percentage of data was lost when publishing thousands of messages in couple seconds.
I\'m logging both
Talked with some guy from Google, and it seems to be an issue with the Python Client:
The consensus on our side is that there is a thread-safety problem in the current python client. The client library is being rewritten almost from scratch as we speak, so I don't want to pursue any fixes in the current version. We expect the new version to become available by end of June.
Running the current code with thread_safe: false in app.yaml or better yet just instantiating the client in every call should is the work around -- the solution you found.
For detailed solution, please see the Update in the question
You shouldn't need to create a new client for every publish operation. I'm betting that the reason that that "fixed the problem" is because it mitigated a race that exists in the publisher client side. I'm also not convinced that the log line you've shown on the publisher side:
API: 200 **** sessionId: 731, messageId:108562396466545 ******
corresponds to a successful publish of sessionId 731 by publish_test_topic(). Under what conditions is that log line printed? The code that has been presented so far does not show this.
Google Cloud Pub/Sub message IDs are unique. It should not be possible for "some messages [to] taken the message_id
of another message." The fact that message ID 108562396466545 was seemingly received means that Pub/Sub did deliver the message to the subscriber and was not lost.
I recommend you check how your session_id
s are generated to ensure that they are indeed unique and that there is exactly one per message. Searching for the sessionId in your JSON via a regular expression search seems a little strange. You would be better off parsing this JSON into an actual object and accessing fields that way.
In general, duplicate messages in Cloud Pub/Sub are always possible; the system guarantees at-least-once delivery. Those messages can be delivered with the same message ID if the duplication happens on the subscribe side (e.g., the ack is not processed in time) or with a different message ID (e.g., if the publish of the message is retried after an error like a deadline exceeded).