Create lists of anagrams from a list of words

问题

I want to find create lists of anagrams from a list of words. Should I use another loop in my code or recursion?

some_list = ['bad', 'app', 'sad', 'mad', 'dab','pge', 'bda', 'ppa', 'das', 'dba']

new_list = [some_list[0]]
i = 0
while i+1 < len(some_list):
    if (''.join(sorted(some_list[0]))) == (''.join(sorted(some_list[i+1]))):
        new_list.append(some_list[i+1])
        i = i+1
    else:
        i = i+1

print(new_list)

My output is ['bad', 'dab', 'bda', 'dba']. But I also want more lists of other anagrams from some_list.

I want the output to be: - ['app', 'ppa'] - ['bad', 'dab', 'bda', 'dba'] - ['sad', 'das']

回答1:

I recommend you write Python, not Java or whatever other language you're emulating there. Here's your core code in Python, with normal looping and without all the unnecessary stuff:

new_list = [some_list[0]]
for word in some_list[1:]:
    if sorted(some_list[0]) == sorted(word):
        new_list.append(word)

I don't see use for recursion, but yes, you could wrap an outer loop around this to find the other anagram groups.

Though this is how I'd do it, using the helpful itertools.groupby:

for _, group in groupby(sorted(some_list, key=sorted), sorted):
    group = list(group)
    if len(group) > 1:
        print(group)

That prints:

['bad', 'dab', 'bda', 'dba']
['sad', 'das']
['app', 'ppa']

Alternative solution for the changed question with sorting the groups:

groups = (list(group) for _, group in groupby(sorted(some_list, key=sorted), sorted))
print([group for group in sorted(groups) if len(group) > 1])

Output:

[['app', 'ppa'], ['bad', 'dab', 'bda', 'dba'], ['sad', 'das']]

回答2:

Your problem is that you loop one time over your list ,since you need to loop based on all of the words.

But i suggest another way for this task,you can use itertools.groupby and sorted function using operator.itemgetter :

some_list = ['bad', 'app', 'sad', 'mad', 'dab','pge', 'bda', 'ppa', 'das', 'dba']

from operator import itemgetter
from itertools import groupby 
s=sorted([(i,''.join(sorted(j))) for i,j in enumerate(some_list)],key=itemgetter(1))
inds= [zip(*g)[0] for _,g in groupby(s,itemgetter(1))]
print [itemgetter(*i)(some_list) for i in inds]

Result :

[('bad', 'dab', 'bda', 'dba'), 'mad', ('sad', 'das'), ('app', 'ppa'), 'pge']

All that i have done here is creating a list of sorted words with those index using sorted and enumerate :

sorted([(i,''.join(sorted(j))) for i,j in enumerate(some_list)],key=itemgetter(1))
[(0, 'abd'), (4, 'abd'), (6, 'abd'), (9, 'abd'), (3, 'adm'), (2, 'ads'), (8, 'ads'), (1, 'app'), (7, 'app'), (5, 'egp')]

then we need to grouping this pairs based on the second element and get the first element (indices) so we will have the following list of tuples :

[(0, 4, 6, 9), (3,), (2, 8), (1, 7), (5,)]

that each tuple is contain the indices of the words that those sorted representations are same.

and at last all you need is picking up the elements of the main list based on the preceding indices :

[itemgetter(*i)(some_list) for i in inds]
[('bad', 'dab', 'bda', 'dba'), 'mad', ('sad', 'das'), ('app', 'ppa'), 'pge']

回答3:

1) Create a function anagrams(word) that is returning a list of anagrams for a single word as your code does.
2) map the function over your list of words.

回答4:

Here is a solution:

from itertools import groupby
some_list = ['bad', 'app', 'sad', 'mad', 'dab','pge', 'bda', 'ppa', 'das', 'dba']
some_list_ordered = map( lambda x : "".join( sorted( x) ), some_list )
some_lists = sorted(zip( some_list_ordered, some_list ) )
anagrams = filter( lambda x : len( x ) > 1, [ zip(*v)[1]  for k,v in groupby( some_lists, lambda x : x[0] ) ] )    

for a in anagrams:
    print a

#('bad', 'bda', 'dab', 'dba')
#('das', 'sad')
#('app', 'ppa')

回答5:

The natural way to do this if you can afford the memory overhead of an additional dictionary seems to me to be:

words = ['bad', 'app', 'sad', 'mad', 'dab','pge', 'bda', 'ppa', 'das', 'dba']

anagrams = {}
for word in words:
    sword = ''.join(sorted(word))
    try:
        anagrams[sword].append(word)
    except KeyError:
        anagrams[sword] = [word]

anagrams_list = [v for v in anagrams.values() if len(v) > 1]
print anagrams_list

Output:

[['app', 'ppa'], ['bad', 'dab', 'bda', 'dba'], ['sad', 'das']]

EDIT: As mentioned in the comment below you can replace the try...except block with the dict method setdefault if the syntax doesn't bother you:

words = ['bad', 'app', 'sad', 'mad', 'dab','pge', 'bda', 'ppa', 'das', 'dba']

anagrams = {}
for word in words:
    sword = ''.join(sorted(word))
    anagrams.setdefault(sword, []).append(word)

anagrams_list = [v for v in anagrams.values() if len(v) > 1]
print anagrams_list

回答6:

You can group the words in a dict, using the sorted word as the key, filtering out the words that values that don't have at least two elements, using an OrderedDict to keep order:

some_list = ['bad', 'app', 'sad', 'mad', 'dab','pge', 'bda', 'ppa', 'das', 'dba']


from collections import OrderedDict

od = OrderedDict()
for ele in some_list:
    srt = "".join(sorted(ele))
    od.setdefault(srt,[]).append(ele)

print(filter(lambda x: len(x) > 1, od.values()))


[['bad', 'dab', 'bda', 'dba'], ['app', 'ppa'], ['sad', 'das']]

Or using a loop and appending to a list, using a temp list to gather common words:

new_list = []
from collections import OrderedDict
for ele in OrderedDict.fromkeys("".join(sorted(ele)) for ele in some_list):
    temp = []
    for s in some_list:
        if ele == ''.join(sorted(s)):
            temp.append(s)
    if len(temp) > 1:
        new_list.append(temp)

If order does not matter a defaultdict would be more efficient:

from collections import defaultdict

d = defaultdict(list)
for ele in some_list:
    d[''.join(sorted(ele))].append(ele)

print(filter(lambda x: len(x) > 1, d.values()))
[['app', 'ppa'], ['bad', 'dab', 'bda', 'dba'], ['sad', 'das']]

来源：https://stackoverflow.com/questions/30246225/create-lists-of-anagrams-from-a-list-of-words

标签

python

list

anagram