Optimizing Put Performance in Berkeley DB

大兔子大兔子 提交于 2020-01-20 19:47:08

问题


I just started playing with Berkeley DB a few days ago so I'm trying to see if there's something I've been missing when it comes to storing data as fast as possible.

Here's some info about the data: - it comes in 512 byte chunks - chunks come in order - chunks will be deleted in FIFO order - if i lose some data off the end because of power failure that's ok as long as the whole db isn't broken

After reading the a bunch of the documentation it seemed like a Queue db was exactly what I wanted.

However, after trying some test code my fastest results were about 1MByte per second just looping through a DB->put with DB_APPEND set. I also tried using transactions and bulk puts but both of these slowed things down considerably so I didn't pursue them for much time. I was inserting into a fresh db created on a NANDFlash chip on my Freescale i.MX35 dev board.

Since we're looking to get at least 2MBytes per second write speeds, I was wondering if there's something I missed that can improve my speeds since I know that my hardware can write faster than this.


回答1:


Try putting this into your DB_CONFIG:

set_flags DB_TXN_WRITE_NOSYNC
set_flags DB_TXN_NOSYNC

From my experience, these increase write performance a lot.


DB_TXN_NOSYNC If set, Berkeley DB will not write or synchronously flush the log on transaction commit or prepare. This means that transactions exhibit the ACI (atomicity, consistency, and isolation) properties, but not D (durability); that is, database integrity will be maintained, but if the application or system fails, it is possible some number of the most recently committed transactions may be undone during recovery. The number of transactions at risk is governed by how many log updates can fit into the log buffer, how often the operating system flushes dirty buffers to disk, and how often the log is checkpointed Calling DB_ENV->set_flags with the DB_TXN_NOSYNC flag only affects the specified DB_ENV handle (and any other Berkeley DB handles opened within the scope of that handle). For consistent behavior across the environment, all DB_ENV handles opened in the environment must either set the DB_TXN_NOSYNC flag or the flag should be specified in the DB_CONFIG configuration file.

The DB_TXN_NOSYNC flag may be used to configure Berkeley DB at any time during the life of the application.


DB_TXN_WRITE_NOSYNC If set, Berkeley DB will write, but will not synchronously flush, the log on transaction commit or prepare. This means that transactions exhibit the ACI (atomicity, consistency, and isolation) properties, but not D (durability); that is, database integrity will be maintained, but if the system fails, it is possible some number of the most recently committed transactions may be undone during recovery. The number of transactions at risk is governed by how often the system flushes dirty buffers to disk and how often the log is checkpointed. Calling DB_ENV->set_flags with the DB_TXN_WRITE_NOSYNC flag only affects the specified DB_ENV handle (and any other Berkeley DB handles opened within the scope of that handle). For consistent behavior across the environment, all DB_ENV handles opened in the environment must either set the DB_TXN_WRITE_NOSYNC flag or the flag should be specified in the DB_CONFIG configuration file.

The DB_TXN_WRITE_NOSYNC flag may be used to configure Berkeley DB at any time during the life of the application.

See http://www.mathematik.uni-ulm.de/help/BerkeleyDB/api_c/env_set_flags.html for more details.




回答2:


I suggest you must use transactions / TDS datastore if as you mention you cannot recreate a database (i.e. it isnt just a local cache) if it gets corrupted. If you dont care about loosing a few items in event of a crash/power outage then DB_TXN_WRITE_NOSYNC will improve TDS performance, you database will still be integral and recoverable. If you store using BTREE and a numeric index (if you have no natural key) and watch out for endian issues so you get good key locality and high page utilization then you should be able to get way more than 2000 inserts a second, especially to SSD, especially if you Use DbMultileKeyDataBuilder to do bulk inserts.



来源:https://stackoverflow.com/questions/3825022/optimizing-put-performance-in-berkeley-db

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!