I\'m using a std::map
(VC++ implementation) and it\'s a little slow for lookups via the map\'s find method.
The key type is std::string
.
Depending on the usage cases, there are some other techniques you can use. For example we had an application that needed to keep up with over a million different file paths. The problem with that there were thousands of objects that needed to keep small maps of these file paths.
Since adding new file paths to the data set was an infrequent operation, when path was added to the system, a master map was searched. If the path was not found, then it was added and a new sequenced integer (starting at 1) was returned. If the path already existed, then the previously assigned integer was returned. Then each map maintained by each object was converted from a string based map to an integer map. Not only did this greatly improve performance, it reduced memory usage by not having so many duplicate copies of the strings.
Sure, this is a very specific optimization. But when it comes to performance improvements, you often find yourself having to make tailored solutions to specific problems.
And I hate strings :) Not are they slow to compare, but they can really trash your CPU caches on high performance software.
Here are some things you can consider:
0) Are you sure this is where the performance bottleneck is? Like the results from Quantify, Cachegrind, gprof or something like that? Because lookups on such a smap map should be fairly fast...
1) You can override the functor used to compare the keys in std::map<>, there is a second template parameter to do that. I doubt you can do much better than operator<, however.
2) Are the contents of the map changing a lot? If not, and given the very small size of your map, maybe using a sorted vector and binary search could yield better results (for example because you can exploit memory locality better.
3) Are the elements known at compile time? You could use a perfect hash function to improve lookup times if that is the case. Search for gperf on the web.
4) Do you have a lot of lookups that fail to find anything? If so, maybe comparing with the first and last elements in the collection may eliminate many mismatches quicker than a full search every time.
These have been suggested already, but in more detail:
5) Since you have so few strings, maybe you could use a different key. For example, are your keys all the same size? Can you use a class containing a fixed-length array of characters? Can you convert your strings to numbers or some data structure with only numbers?
Why don't you use a hashtable instead? boost::unordered_map could do. Or you can roll out your own solution, and store the crc of a string instead of the string itself. Or better yet, put #defines for the strings, and use those for lookup, e.g.,
#define "STRING_1" STRING_1
You might consider pre-computing a hash for a string, and saving that in your map. Doing so gives the advantage of hash compares instead of string compares during the search through the std::map tree.
class HashedString
{
unsigned m_hash;
std::string m_string;
public:
HashedString(const std::string& str)
: m_hash(HashString(str))
, m_string(str)
{};
// ... copy constructor and etc...
unsigned GetHash() const {return m_hash;}
const std::string& GetString() const {return m_string;}
};
This has the benefits of computing a hash of the string once, on construction. After this, you could implement a comparison function:
struct comp
{
bool operator()(const HashedString& lhs, const HashedString& rhs)
{
if(lhs.GetHash() < rhs.GetHash()) return true;
if(lhs.GetHash() > rhs.GetHash()) return false;
return lhs.GetString() < rhs.GetString();
}
};
Since hashes are now computed on HashedString
construction, they are stored that way in the std::map, and so the compare can happen very quickly (an integer compare) in an astronomically high percentage of the time, falling back on standard string compares when the hashes are equal.
hash_map
is not standard, try using unordered_map
available in tr1 (which is available in boost if your tool chain doesn't already have it).
For small numbers of strings you might be better using vector
, as map
is typically implemented as a tree.
Where you have long common substrings, a trie might be a better data structure than a map or a hash_map. I said "might", though - a hash_map already only traverses the key once per lookup, so should be fairly fast. I won't discuss it further since others already have.
You could also consider a splay tree if some keys are more frequently looked up than others, but of course this makes the worst-case lookup worse than a balanced tree, and lookups are mutating operations, which may matter to you if you're using e.g. a reader-writer lock.
If you care about the performance of lookups more than modifications, you might do better with an AVL tree than a red-black, which I think is what STL implementations generally use for map. An AVL tree is typically better balanced and so will on average require fewer comparisons per lookup, but the difference is marginal.
Finding an implementation of these that you're happy with might be an issue. A search on the Boost main page suggests they have a splay and AVL tree but not a trie.
You mentioned in a comment that you never have a lookup that fails to find anything. So you could in theory skip the final comparison, which in a tree of 15 < 2^4 elements could give you something like a 20-25% speedup without doing anything else. In fact, maybe more than that, since equal strings are the slowest to compare. Whether it's worth writing your own container just for this optimisation is another question.
You might also consider locality of reference - I don't know whether you could avoid the occasional page miss by allocating the keys and the nodes out of a small heap. If you only need about 15 entries at a time, then assuming a file name limit below 256 bytes you could ensure that everything accessed during a lookup fits into a single 4k page (apart from the key being looked up, of course). It may be that comparing the strings is insignificant compared with a couple of page loads. However, if this is your bottleneck there must be an enormous number of lookups going on, so I'd guess that everything is reasonably close to the CPU. Worth checking, maybe.
Another thought: if you are using pessimistic locking on a structure where there's a lot of contention (you said in a comment the program is massively multi-threaded) then regardless of what the profiler tells you (what code the CPU cycles are spent in), it might be costing you more than you think by effectively limiting you to 1 core. Try a reader-writer lock?