How can I optimize this Python code to generate all words with word-distance 1?

前端 未结 12 858
予麋鹿
予麋鹿 2021-01-30 22:11

Profiling shows this is the slowest segment of my code for a little word game I wrote:

def distance(word1, word2):
    difference = 0
    for i in range(len(word         


        
12条回答
  •  时光取名叫无心
    2021-01-30 22:33

    How often is the distance function called with the same arguments? A simple to implement optimization would be to use memoization.

    You could probably also create some sort of dictionary with frozensets of letters and lists of words that differ by one and look up values in that. This datastructure could either be stored and loaded through pickle or generated from scratch at startup.

    Short circuiting the evaluation will only give you gains if the words you are using are very long, since the hamming distance algorithm you're using is basically O(n) where n is the word length.

    I did some experiments with timeit for some alternative approaches that may be illustrative.

    Timeit Results

    Your Solution

    d = """\
    def distance(word1, word2):
        difference = 0
        for i in range(len(word1)):
            if word1[i] != word2[i]:
                difference += 1
        return difference
    """
    t1 = timeit.Timer('distance("hello", "belko")', d)
    print t1.timeit() # prints 6.502113536776391
    

    One Liner

    d = """\
    from itertools import izip
    def hamdist(s1, s2):
        return sum(ch1 != ch2 for ch1, ch2 in izip(s1,s2))
    """
    t2 = timeit.Timer('hamdist("hello", "belko")', d)
    print t2.timeit() # prints 10.985101179
    

    Shortcut Evaluation

    d = """\
    def distance_is_one(word1, word2):
        diff = 0
        for i in xrange(len(word1)):
            if word1[i] != word2[i]:
                diff += 1
            if diff > 1:
                return False
        return diff == 1
    """
    t3 = timeit.Timer('hamdist("hello", "belko")', d)
    print t2.timeit() # prints 6.63337
    

提交回复
热议问题