DocumentDB Change Feed - How to see all changes to a document

前端 未结 1 744
旧巷少年郎
旧巷少年郎 2021-02-06 11:18

This new Change Feed feature provided by DocumentDB is pretty cool. However, the documentation states:

Each change to a document appears only once in the

1条回答
  •  不知归路
    2021-02-06 11:47

    DocumentDB team member here. I'll start off saying please propose/vote for support for all versions/generations of the document here: http://feedback.azure.com/forums/263030-documentdb

    The intent of Change Feed supporting the latest version was for two reasons:

    1. Many problems like data synchronization, and stream processing rely on the latest version, and do not need the intermediate versions
    2. This approach has the advantage of not requiring additional storage to store all versions or having a time period for change feed availability.

    You had mentioned you're already aware of workarounds, but I'll just state this for the benefit of others: this problem can be solved by inverting what's stored in DocumentDB. That is, you can store all versions in DocumentDB via creating new documents, then consolidate them via change feed by upserting the latest version.

    To answer the question in comments, you must absolutely use Change Feed over querying by timestamp for the following reasons:

    1. Change Feed is much more efficient. Querying "order by timestamp" across a distributed dataset performs a global sort, whereas Change Feed sorts locally within partitions timestamp partially. Additionally, there's no query parsing overhead
    2. Clock time is less meaningful in distributed systems due to clock skew, and differentiating between multiple updates within a second/millisecond can be important. Instead, you need the "logical time" representing the exact commit order within the database. With change feed, updates within a partition key are in exact order of commit, and you get all documents updated within a transaction stamped with the same logical timestamp.
    3. Change Feed can be consumed in a distributed manner across multiple workers unlike query. This is great when working with a downstream scalable compute framework like Apache Storm or Azure Functions.

    0 讨论(0)
提交回复
热议问题