arangodump: How do I know the latest “revision”?

问题

I'm doing manual parsing and importing of data from arangodump, which contains records of every revision of every document. The problem is, I cannot tell which item is the latest revision.

(This is also problematic in the case of deleted documents where there would be records in the arangodump with a revision but with an empty document.)

From the docs:

Clients can use revisions ids to perform simple equality/non-equality comparisons (e.g. to check whether a document has changed or not), but they should not use revision ids to perform greater/less than comparisons with them to check if a document revision is older than one another, even if this might work for some cases.

Docs doesn't give me hope. Is this even possible?

If not, what is the proper way to manually import arangodump into a different application?

回答1:

ArangoDump is intended to give you a snapshot of the existing database as fast as possible. Thus it doesn't give you the contents on the collection level, but as whats on disk. This is, what as @CoDEmanX noted, at the sacrifice of resource usage on the database server ArangoExport will give you.

To answer the reason why you get older versions of documents, we will have to take a deeper look at the database itself.

A insert into the database will create a new document, with a _key. Once you try to replace this by i.e. UPDATE, whats actually happening is, that an invisible document (aka Marker) is written, that is to remove the old version. After that, a new Version of the document is created.

This is all done liniar, so you have a write ahead log - aka WAL. This is written in linear fashion, but only some of its content is defined to have been sync'ed to disk. Once a transaction demands a document to be sealed - the execution is paused untill the kernel replies that it can ensure this stage has been synchronized to the storage.

That much about the way to disk. It is implemented that way to give you a maximum throughput, while giving you warranties that certain things have been written (and are not somewhere stuck in disk caches etc.)

A later on job will try to clean up everything, and tie up loose ends. This is called the 'Collection'. It will collect documents from the WAL, and store it in permanent database files. It will also try to combine delete-markers with existing documents resulting in them to finally disappear.

So once the collection has been run, deleted documents combined with their delete markers will actually disappear. Multiple database files may be combined to one database file, if their size undergoes a certain threshhold. It may even happen, that some delete markers find their documents only after such a combination.

来源：https://stackoverflow.com/questions/51604089/arangodump-how-do-i-know-the-latest-revision

标签

arangodb