Profiling shows this is the slowest segment of my code for a little word game I wrote:
def distance(word1, word2):
difference = 0
for i in range(len(word
How often is the distance function called with the same arguments? A simple to implement optimization would be to use memoization.
You could probably also create some sort of dictionary with frozensets of letters and lists of words that differ by one and look up values in that. This datastructure could either be stored and loaded through pickle or generated from scratch at startup.
Short circuiting the evaluation will only give you gains if the words you are using are very long, since the hamming distance algorithm you're using is basically O(n) where n is the word length.
I did some experiments with timeit for some alternative approaches that may be illustrative.
d = """\
def distance(word1, word2):
difference = 0
for i in range(len(word1)):
if word1[i] != word2[i]:
difference += 1
return difference
"""
t1 = timeit.Timer('distance("hello", "belko")', d)
print t1.timeit() # prints 6.502113536776391
d = """\
from itertools import izip
def hamdist(s1, s2):
return sum(ch1 != ch2 for ch1, ch2 in izip(s1,s2))
"""
t2 = timeit.Timer('hamdist("hello", "belko")', d)
print t2.timeit() # prints 10.985101179
d = """\
def distance_is_one(word1, word2):
diff = 0
for i in xrange(len(word1)):
if word1[i] != word2[i]:
diff += 1
if diff > 1:
return False
return diff == 1
"""
t3 = timeit.Timer('hamdist("hello", "belko")', d)
print t2.timeit() # prints 6.63337