Find anagram of input on set of strings..?

问题

Given a set of strings (large set), and an input string, you need to find all the anagrams of the input string efficiently. What data structure will you use. And using that, how will you find the anagrams?

Things that I have thought of are these:

Using maps

a) eliminate all words with more/less letters than the input.

b) put the input characters in map

c) Traverse the map for each string and see if all letters are present with their count.
Using Tries

a) Put all strings which have the right number of characters into a trie.

b) traverse each branch and go deeper if the letter is contained in the input.

c) if leaf reached the word is an anagram

Can anyone find a better solution?

Are there any problems that you find in the above approaches?

回答1:

Build a frequency-map from each word and compare these maps.

Pseudo code:

class Word

  string word
  map<char, int> frequency

  Word(string w)
    word = w
    for char in word
      int count = frequency.get(char)
      if count == null
        count = 0
      count++
      frequency.put(char, count)

  boolean is_anagram_of(that)
    return this.frequency == that.frequency

回答2:

You could build an hashmap where the key is sorted(word), and the value is a list of all the words that, sorted, give the corresponding key:

private Map<String, List<String>> anagrams = new HashMap<String, List<String>>();

void buildIndex(){
    for(String word : words){
        String sortedWord = sortWord(word);
        if(!anagrams.containsKey(sortedWord)){
            anagrams.put(sortedWord, new ArrayList<String>());
        }
        anagrams.get(sortedWord).add(word);
    }
}

Then you just do a lookup for the sorted word in the hashmap you just built, and you'll have the list of all the anagrams.

回答3:

import java.util.HashMap;
import java.util.HashSet;
import java.util.Map;
import java.util.Set;
/*
 *Program for Find Anagrams from Given A string of Arrays.
 *
 *Program's Maximum Time Complexity is O(n) + O(klogk), here k is the length of word.
 *
 * By removal of Sorting, Program's Complexity is O(n) 
 *  **/
public class FindAnagramsOptimized {
    public static void main(String[] args) {
        String[] words = { "gOd", "doG", "doll", "llod", "lold", "life", 
"sandesh", "101", "011", "110" };
        System.out.println(getAnaGram(words));
    }
    // Space Complexity O(n)
    // Time Complexity O(nLogn)
    static Set<String> getAnaGram(String[] allWords) {
        // Internal Data Structure for Keeping the Values
        class OriginalOccurence {
            int occurence;
            int index;
        }
        Map<String, OriginalOccurence> mapOfOccurence = new HashMap<>();
        int count = 0;
        // Loop Time Complexity is O(n)
    // Space Complexity O(K+2K), here K is unique words after sorting on a

    for (String word : allWords) {
        String key = sortedWord(word);

        if (key == null) {
            continue;
        }
        if (!mapOfOccurence.containsKey(key)) {
            OriginalOccurence original = new OriginalOccurence();
            original.index = count;
            original.occurence = 1;
            mapOfOccurence.put(key, original);
        } else {
            OriginalOccurence tempVar = mapOfOccurence.get(key);
            tempVar.occurence += 1;
            mapOfOccurence.put(key, tempVar);
        }
        count++;
    }

    Set<String> finalAnagrams = new HashSet<>();

    // Loop works in O(K), here K is unique words after sorting on
    // characters
    for (Map.Entry<String, OriginalOccurence> anaGramedWordList : mapOfOccurence.entrySet()) {
        if (anaGramedWordList.getValue().occurence > 1) {
            finalAnagrams.add(allWords[anaGramedWordList.getValue().index]);
        }
    }

    return finalAnagrams;
}

// Array Sort works in O(nLogn)
// Customized Sorting for only chracter's works in O(n) time.
private static String sortedWord(String word) {

    // int[] asciiArray = new int[word.length()];
    int[] asciiArrayOf26 = new int[26];
    // char[] lowerCaseCharacterArray = new char[word.length()];
    // int characterSequence = 0;
    // Ignore Case Logic written in lower level
    for (char character : word.toCharArray()) {
        if (character >= 97 && character <= 122) {
            // asciiArray[characterSequence] = character;
            if (asciiArrayOf26[character - 97] != 0) {
                asciiArrayOf26[character - 97] += 1;
            } else {
                asciiArrayOf26[character - 97] = 1;
            }
        } else if (character >= 65 && character <= 90) {
            // asciiArray[characterSequence] = character + 32;
            if (asciiArrayOf26[character + 32 - 97] != 0) {
                asciiArrayOf26[character + 32 - 97] += 1;
            } else {
                asciiArrayOf26[character + 32 - 97] = 1;
            }
        } else {
            return null;
        }

        // lowerCaseCharacterArray[characterSequence] = (char)
        // asciiArray[characterSequence];
        // characterSequence++;
    }
    // Arrays.sort(lowerCaseCharacterArray);

    StringBuilder sortedWord = new StringBuilder();
    int asciiToIndex = 0;
    // This Logic uses for reading the occurrences from array and copying
    // back into the character array
    for (int asciiValueOfCharacter : asciiArrayOf26) {
        if (asciiValueOfCharacter != 0) {
            if (asciiValueOfCharacter == 1) {
                sortedWord.append((char) (asciiToIndex + 97));
            } else {
                for (int i = 0; i < asciiValueOfCharacter; i++) {
                    sortedWord.append((char) (asciiToIndex + 97));
                }
            }
        }
        asciiToIndex++;
    }
    // return new String(lowerCaseCharacterArray);
    return sortedWord.toString();
}
}

来源：https://stackoverflow.com/questions/8971688/find-anagram-of-input-on-set-of-strings

标签

algorithm

map

trie