In our application we use std::map
to store (key, value) data and use serialization to store that data on disk. With this approach we are finding that the disk I/O is performance bottleneck and finding values using key is not very fast.
I have come across LevelDB and thinking of using it. But I have some questions.
- LevelDB's documentation says its made for (string, string) key value pair. Does it mean that I can not use for custom key value pairs?
- It seems the difference between
std::map
and LevelDB is that LevelDB is persistent andstd::map
works in memory. So does it mean the disk I/O bottleneck will be more problematic for levelDB.
More specifically can anybody please explain if LevelDB could be better choice than std::map
?
PS: I tried using hash_map
s but it appears to be slower than std::map
LevelDB just does something else than std::map.
Are you really saying you want (high performance) persistence for std::map?
look at std::map with a custom allocator. Allocate the entries from a memory mapped region and use fsync to to ensure the information hits the disk at strategic moments in time.
perhaps combine that with EASTL (which boasts a faster std::map and thrives with custom allocators - in fact they have no default allocator)
look at tuning your hash_map (std::unorderded_map); if hash_maps are slower, you should look into (a) loadfactor (b) hash function tuning
last but not least: evaluate the use of Boost Serialization for binary serialization of your map (whatever implementation you picked). In my experience Boost Serialization performance is top of the bill.
What you're doing now is this:
Say you have 1000000 records in a file. You read the whole file into std::map, this takes about ~1000000 operations. You use find/insert to locate and/or insert an element, this takes logarithmic time (about 20 comparisons). And now you save the whole file again, transferring all these 1000000 records back to the file.
The problem is that you benefit absolutely nothing from using std::map. std::map gives you fast search times (logarithmic), but initializing and serializing the whole map per each lookup nullifies it's benefits.
What you need is either redesign you program so you will load the map once at the startup and serialize it once at the termination. Or, perhaps if you need the database semantics, go for a real database implementation. I suggest using SQLite, although LevelDB might be just as good for you.
来源:https://stackoverflow.com/questions/7804692/leveldb-vs-stdmap