Google Cloud Pubsub Data lost

前端 未结 3 2014
悲&欢浪女
悲&欢浪女 2021-01-14 01:35

I\'m experiencing a problem with GCP pubsub where a small percentage of data was lost when publishing thousands of messages in couple seconds.

I\'m logging both

相关标签:
3条回答
  • 2021-01-14 01:45

    Talked with some guy from Google, and it seems to be an issue with the Python Client:

    The consensus on our side is that there is a thread-safety problem in the current python client. The client library is being rewritten almost from scratch as we speak, so I don't want to pursue any fixes in the current version. We expect the new version to become available by end of June.

    Running the current code with thread_safe: false in app.yaml or better yet just instantiating the client in every call should is the work around -- the solution you found.

    For detailed solution, please see the Update in the question

    0 讨论(0)
  • 2021-01-14 01:47

    You shouldn't need to create a new client for every publish operation. I'm betting that the reason that that "fixed the problem" is because it mitigated a race that exists in the publisher client side. I'm also not convinced that the log line you've shown on the publisher side:

    API: 200 **** sessionId: 731, messageId:108562396466545 ******

    corresponds to a successful publish of sessionId 731 by publish_test_topic(). Under what conditions is that log line printed? The code that has been presented so far does not show this.

    0 讨论(0)
  • 2021-01-14 02:05

    Google Cloud Pub/Sub message IDs are unique. It should not be possible for "some messages [to] taken the message_id of another message." The fact that message ID 108562396466545 was seemingly received means that Pub/Sub did deliver the message to the subscriber and was not lost.

    I recommend you check how your session_ids are generated to ensure that they are indeed unique and that there is exactly one per message. Searching for the sessionId in your JSON via a regular expression search seems a little strange. You would be better off parsing this JSON into an actual object and accessing fields that way.

    In general, duplicate messages in Cloud Pub/Sub are always possible; the system guarantees at-least-once delivery. Those messages can be delivered with the same message ID if the duplication happens on the subscribe side (e.g., the ack is not processed in time) or with a different message ID (e.g., if the publish of the message is retried after an error like a deadline exceeded).

    0 讨论(0)
提交回复
热议问题