Find all the possible N-length anagrams - fast alternatives

為{幸葍}努か 提交于 2021-01-28 07:43:10

问题


I am given a sequence of letters and have to produce all the N-length anagrams of the sequence given, where N is the length of the sequence.

I am following a kinda naive approach in python, where I am taking all the permutations in order to achieve that. I have found some similar threads like this one but I would prefer a math-oriented approach in Python. So what would be a more performant alternative to permutations? Is there anything particularly wrong in my attempt below?

from itertools import permutations
def find_all_anagrams(word):

pp = permutations(word)
perm_set = set()
for i in pp:
    perm_set.add(i)
ll = [list(i) for i in perm_set]
ll.sort()
print(ll)

回答1:


If there are lots of repeated letters, the key will be to produce each anagram only once instead of producing all possible permutations and eliminating duplicates.

Here's one possible algorithm which only produces each anagram once:

from collections import Counter

def perm(unplaced, prefix):
  if unplaced:
    for element in unplaced:
      yield from perm(unplaced - Counter(element), prefix + element)
  else:
    yield prefix

def permutations(iterable):
  yield from perm(Counter(iterable), "")

That's actually not much different from the classic recursion to produce all permutations; the only difference is that it uses a collections.Counter (a multiset) to hold the as-yet-unplaced elements instead of just using a list.

The number of Counter objects produced in the course of the iteration is certainly excessive, and there is almost certainly a faster way of writing that; I chose this version for its simplicity and (hopefully) its clarity




回答2:


This is very slow for long words with many similar characters. Slow compared to theoretical maximum performance that is. For example, permutations("mississippi") will produce a much longer list than necessary. It will have a length of 39916800, but but the set has a size of 34650.

>>> len(list(permutations("mississippi")))
39916800
>>> len(set(permutations("mississippi")))
34650

So the big flaw with your method is that you generate ALL anagrams and then remove the duplicates. Use a method that only generates the unique anagrams.

EDIT:

Here is some working, but extremely ugly and possibly buggy code. I'm making it nicer as you're reading this. It does give 34650 for mississippi, so I assume there aren't any major bugs. Warning again. UGLY!

# Returns a dictionary with letter count
# get_letter_list("mississippi") returns 
# {'i':4, 'm':1, 'p': 2, 's':4}
def get_letter_list(word):
    w = sorted(word)
    c = 0
    dd = {}
    dd[w[0]]=1
    for l in range(1,len(w)):
        if w[l]==w[l-1]:
            d[c]=d[c]+1
            dd[w[l]]=dd[w[l]]+1
        else:
            c=c+1
            d.append(1)
            dd[w[l]]=1
    return dd

def sum_dict(d):
    s=0
    for x in d:
        s=s+d[x]
    return s

# Recursively create the anagrams. It takes a letter list
# from the above function as an argument.
def create_anagrams(dd):
    if sum_dict(dd)==1: # If there's only one letter left
        for l in dd:
            return l # Ugly hack, because I'm not used to dics
    a = []
    for l in dd:
        if dd[l] != 0:
            newdd=dict(dd)
            newdd[l]=newdd[l]-1
            if newdd[l]==0:
                newdd.pop(l)
            newl=create(newdd)
        for x in newl:
            a.append(str(l)+str(x))
    return a

>>> print (len(create_anagrams(get_letter_list("mississippi"))))
34650

It works like this: For every unique letter l, create all unique permutations with one less occurance of the letter l, and then append l to all these permutations.

For "mississippi", this is way faster than set(permutations(word)) and it's far from optimally written. For instance, dictionaries are quite slow and there's probably lots of things to improve in this code, but it shows that the algorithm itself is much faster than your approach.




回答3:


Maybe I am missing something, but why don't you just do this:

from itertools import permutations

def find_all_anagrams(word):
    return sorted(set(permutations(word)))



回答4:


You could simplify to:

from itertools import permutations

def find_all_anagrams(word):
    word = set(''.join(sorted(word)))
    return list(permutations(word))

In the doc for permutations the code is detailled and it seems already optimized.




回答5:


I don't know python but I want to try to help you: probably there are a lot of other more performant algorithm, but I've thought about this one: it's completely recursive and it should cover all the cases of a permutation. I want to start with a basic example:

permutation of ABC

Now, this algorithm works in this way: for Length times you shift right the letters, but the last letter will become the first one (you could easily do this with a queue).

Back to the example, we will have:

  • ABC
  • BCA
  • CAB

Now you repeat the first (and only) step with the substring built from the second letter to the last one.

Unfortunately, with this algorithm you cannot consider permutation with repetition.



来源:https://stackoverflow.com/questions/44852201/find-all-the-possible-n-length-anagrams-fast-alternatives

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!