Memory usage of ArangoDB

半城伤御伤魂 提交于 2019-11-29 22:27:36

ArangoDB stores all data in memory-mapped files. Each collection can have 0 to n datafiles, with a default filesize of 32 MB each (note that this filesize can be adjusted globally or on a per-collection level). An empty collection (that never had any data) will not have a datafile. The first write to a collection will create the datafile, and whenever a datafile is full, a new one will be created automatically.

Collections allocate datafiles in chunks of 32 MB by default. If you do have many but small collections this might waste some memory. If you many few but big collections, the potential waste (free space at the end of a datafile) probably doesn't matter too much.

Whenever any ArangoDB operation reads data from or writes data to a memory-mapped datafile, the operating system will first translate the offset into the file into a page number. This is because each datafile is implicitly split into pages of a specific size. How big a page is is platform-dependent, but let's assume pages are 4 KB in size. So a datafile with a default filesize will have 8192 pages.

After the OS has translated the offset into the file into a page number, it will make sure the data of requested page are present in physical RAM. If the page is not yet in physical RAM, the operating system will issue a page fault to trigger loading of the requested page from disk or swap into physical RAM. This will eventually make the complete page available in RAM, and any reads or writes to the page's data may occur after that.

All of this is done by the operating system's virtual memory manager. The operating system is free to map as many pages from a datafile into RAM as it thinks is good. For example, when a memory-mapped file is accessed sequentially, the operating system will likely be clever and read-ahead many pages, so they are already in physical RAM when actually accessed.

The OS is also free to swap out some or all pages of a datafile. It will likely swap out pages if there is not enough physical RAM available to keep all pages from all datafiles in RAM at the same time. It may also swap out pages that haven't been used for a while, to make RAM available for other operations. It will likely use some LRU algorithm for this.

How the virtual memory manager of an OS behaves exactly is wildly different across platforms and implementations. Most systems also allow configuring the VM subsystem. For example, here are some parameters for Linux's VM subsystem.

It is therefore hard to tell how much physical memory ArangoDB will actually use for a given number of collection and their datafiles. If the collections aren't accessed at all, having the datafiles memory-mapped might use almost no RAM as the OS has probably swapped the collections out fully or at least partially. If the collections are heavily in use, the OS will likely have their datafiles fully mapped into RAM. But in both cases the memory counts as memory-mapped. This is you can have a much higher virtual memory usage than you have physical RAM.

As mentioned before, the OS has to do a lot of work when accessing pages that are not in RAM, and you want to avoid this if possible. If the total size of your frequently used collections exceeds the size of the physical RAM, the OS has no alternative but to swap pages out and in a lot when you access these collections. Using an SSD for the swap will likely be better than using a spinning HDD, but is still far slower than RAM access. Long story short: the data of your active collections (datafiles plus indexes) should fit into physical RAM if possible, or you will see a lot of disk activity.

Apart from that, ArangoDB does not only allocate virtual memory for the collection datafiles, but it also starts a few V8 threads (V8 is the JavaScript engine in ArangoDB) that also use virtual memory. This virtual memory is not file-backed.

In an empty ArangoDB V8 accounts for most of the virtual memory usage. For example, on my 64 bit computer, the V8 threads consume about 5 GB of virtual memory (but ArangoDB in total only uses 140 MB RAM), whereas on my 32 bit computer with less RAM, the V8 threads use about 600 - 700 MB virtual memory. In your case, with the 4.5 GB VM usage, I suspect V8 is the reason, too.

The virtual memory usage for the V8 threads obviously correlates with the number of V8 threads started. For example, increasing the value of the startup parameter --server.threads will start more threads and use more virtual memory for V8, and lowering the value will start less threads and use less virtual memory.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!