I am dabbling in Powershell and completely new to .NET.
I am running a PS script that starts with an empty hash table. The hash table will grow to at least 15,000 to
So it's a few weeks later, and I wasn't able to come up with the perfect solution. A friend at Google suggested splitting the hash into several smaller hashes. He suggested that each time I went to look up a key, I'd have several misses until I found the right "bucket", but he said the read penalty wouldn't be nearly as bad as the write penalty when the collision algorithm ran to insert entries into the (already giant) hash table.
I took this idea and took it one step further. I split the hash into 16 smaller buckets. When inserting an email address as a key into the data structures, I actually first compute a hash on the email address itself, and do a mod 16 operation to get a consistent value between 0 and 15. I then use that calculated value as the "bucket" number.
So instead of using one giant hash, I actually have a 16-element array, whose elements are hash tables of email addresses.
The total speed it takes to build the in-memory representation of my "master list" of 20,000+ email addresses, using split-up hash table buckets, is now roughly 1,000% faster. (10 times faster).
Accessing all of the data in the hashes has no noticeable speed delays. This is the best solution I've been able to come up with so far. It's slightly ugly, but the performance improvement speaks for itself.