I want to know how I could perform some kind of index on keys from a python dictionary. The dictionary holds approx. 400,000 items, so I am trying to avoid a linear search.
If you only need to find keys that start with a prefix then you can use a binary search. Something like this will do the job:
import bisect
words = sorted("""
a b c stack stacey stackoverflow stacked star stare x y z
""".split())
n = len(words)
print n, "words"
print words
print
tests = sorted("""
r s ss st sta stack star stare stop su t
""".split())
for test in tests:
i = bisect.bisect_left(words, test)
if words[i] < test: i += 1
print test, i
while i < n and words[i].startswith(test):
print i, words[i]
i += 1
Output:
12 words
['a', 'b', 'c', 'stacey', 'stack', 'stacked', 'stackoverflow', 'star', 'stare',
'x', 'y', 'z']
r 3
s 3
3 stacey
4 stack
5 stacked
6 stackoverflow
7 star
8 stare
ss 3
st 3
3 stacey
4 stack
5 stacked
6 stackoverflow
7 star
8 stare
sta 3
3 stacey
4 stack
5 stacked
6 stackoverflow
7 star
8 stare
stack 4
4 stack
5 stacked
6 stackoverflow
star 7
7 star
8 stare
stare 8
8 stare
stop 9
su 9
t 9