I have a large dictionary constructed like so:
programs[\'New York\'] = \'some values...\'
programs[\'Port Authority of New York\'] = \'some values...\'
pr
You could generate all substrings ahead of time, and map them to their respective keys.
#generates all substrings of s.
def genSubstrings(s):
#yield all substrings that contain the first character of the string
for i in range(1, len(s)+1):
yield s[:i]
#yield all substrings that don't contain the first character
if len(s) > 1:
for j in genSubstrings(s[1:]):
yield j
keys = ["New York", "Port Authority of New York", "New York City"]
substrings = {}
for key in keys:
for substring in genSubstrings(key):
if substring not in substrings:
substrings[substring] = []
substrings[substring].append(key)
Then you can query substrings
to get the keys that contain that substring:
>>>substrings["New York"]
['New York', 'Port Authority of New York', 'New York City']
>>> substrings["of New York"]
['Port Authority of New York']
Pros:
Cons:
substrings
incurs a one-time cost at the beginning of your program, taking time proportional to the number of keys in programs
.substrings
will grow approximately linearly with the number of keys in programs
, increasing the memory usage of your script.genSubstrings
has O(n^2) performance in relation to the size of your key. For example, "Port Authority of New York" generates 351 substrings.