Most efficient way to detect changes of a remote CMIS repository?

删除回忆录丶 提交于 2019-12-03 16:38:14

Using the repository's change log is the right way to go, but realize that not every repository supports this. For example, for Alfresco you must configure the audit sub-system and you must set audit.cmischangelog.enabled=true in alfresco-global.properties.

To find out if your repo supports changes you can look as the results of the repository's getCapabilities response. If you see 'Changes' set to 'None' then your repository doesn't support change logs.

Assuming it does, you need to ask the repository for its latest change log token. You can get that from getRepositoryInfo. Save that before you call getContentChanges. Then, on the next call, pass in the token. You'll get the changes made since the token was issued.

So, your code needs to:

  1. Check getCapabilities for something other than Changes = None
  2. Save the getRepositoryInfo's latestChangeLogToken
  3. The first time you ask, call getContentChanges with no arguments
  4. The next time you ask, call getcontentChanges with the last saved token
  5. You can then process the result set. Each change log entry tells you its type (created, updated, deleted, permissions, etc., see spec for exact values) and provides the cmis:objectId of the changed object.
  6. Repeat with step 2.

I have a "cmis-sync" script that does one-way synchronization using this approach implemented in Python. I've tested it against Alfresco as the source and the OpenCMIS InMemory repository as the target. If there is interest I can make it available.

Ellipson

A more ideal version of idea 3 is easily accomplished according to some digging through the CMIS protocol you posted.

2.1.11 Change Log

CMIS provides a “change log” mechanism to allow applications to easily discover the set of changes that have occurred to objects stored in the repository since a previous point in time. This change log can then be used by applications such as search services that maintain an external index of the repository to efficiently determine how to synchronize their index to the current state of the repository (rather than having to query for all objects currently in the repository).

Entries recorded in the change log are referred to below as “change events”.

Note that change events in the change log MUST be returned in ascending order from the time when the change event occurred.

Using whatever tools of your choice, you should be able to do an initial pull of the entire repository and save the time the pull was performed. Subsequent queries to the repository (at an interval of your choosing) are done with the following procedure:

  • Pull down the CMIS changelog from the repository
  • Parse all changes created after the previous pulls
  • Perform operations based on the ChangeType enum: for example, if the "deleted" enum is present for an objectID, delete that object locally.
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!