Given a string array， return all groups of strings that are anagrams

Given a string array， return all groups of strings that are anagrams.

My solutions:

For each string word in the array, sort it O(m lg m), m is the average length of a word.

Build up a hash Table < string, list >.

Put the sorted word into the hash table as key and also generate all permutations (O(m!)) of the word, search each permutation in a dictionary (a prefix tree map) with O(m), if it is in the dictionary, put (O(1)) it into the hash table so that all permutated words are put into the list with the same key.

Totally, O(n * m * lg m * m!) time and O(n* m!) space , n is the size of the given array.

If m is very large, it is not efficient , m! .

Any better solutions ?

thanks

We define an alphabet, which contains every letter we may have in our wordlist. Next, we need a different prime for each of the letters in the alphabet, I recommend using the smallest you can find.

That would give us the following mapping: { a => 2, b => 3, c => 5, d => 7, etc }

Now count the letters in the word you want to represent as integer, and build your result integer as follows:

Pseudocode:

result = 1
for each letter:
....result *= power(prime[letter], count(letter,word)

some examples:

aaaa => 2^4

aabb => 2^2 * 3^2 = bbaa = baba = ...

and so on.

So you will have an integer representing each word in your dictionary and the word you want to check will be able to be converted to an integer. So if n is the size of your wordlist and k is the size of the longest word it will take O(nk) to build your new dictionary and O(k) to check a new word.

Hackthissite.com has a programming challenge which is: Given a scrambled word, look it up in a dictionary to see if any anagrams of it are in the dictionary. There is a good article on an efficient solution to the problem from which I have borrowed the answer, it also goes into detail on further optimisations.

use counting sort to sort the word so that sorting can be done in O(m). after sorting generate key from word and insert a node (key,value) into hashtable. Generating key can be achieved in O(m).

You can take value in (key,value) as some dynamic array which can hold more than one strings. Each time you insert a key which is already present just push the original word from which key is generated on value array.

So overall time complexity O(mn) where n is the total number of words (size of input).

Also this link has solution to similar problems-> http://yourbitsandbytes.com/viewtopic.php?f=10&t=42

#include <map>
#include <iostream>
#include <set>
#include <algorithm>

int main () {
  std::string word;
  std::map<std::string, std::set<std::string>> anagrams;
  while(std::cin >> word) {
    std::string sortedWord(word);
    std::sort(sortedWord.begin(), sortedWord.end());
    anagrams[sortedWord].insert(word);
  }
  for(auto& pair : anagrams) {
    for(auto& word : pair.second) {
      std::cout << word << " ";
    }
    std::cout << "\n";
  }
}

I'll let someone who is better at big-O analysis than I am figure out the complexities.

turn the dictionary into a mapping of the sorted characters of a word mapped to every word of those characters and store that. For each word you are given, sort it and add the list of anagrams from the mapping to your output.

I don't believe you can do better in O terms than

sorting the letters of each word
sorting the list of sorted words
each set of anagrams will now be grouped consecutively.

来源：https://stackoverflow.com/questions/8538924/given-a-string-array-return-all-groups-of-strings-that-are-anagrams

标签

c++

algorithm

data-structures

anagram