Anagram Algorithm using a hashtable and/or tries

问题

I have been searching the internet for awhile now for steps to find all the anagrams of a string (word) (i.e. Team produces the word tame) using a hashtable and a trie. All I have found here on SO is to verify 2 words are anagrams. I would like to take it a step further and find an algorithm in english so that I can program it in Java.

For example,

Loop through all the characters.
For each unique character insert into the hashtable.
and so forth.

I don't want a complete program. Yes, I am practicing for an interview. If this question comes up then I will know it and know how to explain it not just memorize it.

回答1:

the most succinct answer due to some guy quoted in the "programming pearls" book is (paraphrasing):

"sort it this way (waves hand horizontally left to right), and then that way (waves hand vertically top to bottom)"

this means, starting from a one-column table (word), create a two column table: (sorted_word, word), then sort it on the first column.

now to find anagrams of a word, first compute sorted word and do a binary search for its first occurrence in the first column of the table, and read off the second column values while the first column is the same.

input (does not need to be sorted):

mate
tame
mote
team
tome

sorted "this way" (horizontally):

aemt, mate
aemt, tame
emot, mote
aemt, team
emot, tome

sorted "that way" (vertically):

aemt, mate
aemt, tame
aemt, team
emot, mote
emot, tome

lookup "team" -> "aemt"

aemt, mate
aemt, tame
aemt, team

As far as hashtables/tries they only come into the picture if you want a slightly speedier lookup. Using hash tables you can partition the 2-column vertically sorted table into k-partitions based on the hash of the first column. this will give you a constant factor speedup because you have to do a binary search only within one partition. tries are a different way of optimizing by helping you avoid doing too many string comparisons, you hang off the index of the first row for the appropriate section of the table for each terminal in the trie.

回答2:

Hash tables are not the best solution, so I doubt you would be required to use them.

The simplest approach to finding anagram pairs (that I know of) is as follows:

Map characters as follows:

a -> 2 b -> 3 c -> 5 d -> 7

and so on, such that letters a..z are mapped to the first 26 primes.

Multiply the character values for each character in the word, lets call it the "anagram number". Its pretty easy to see TEAM and TAME will produce the same number. Indeed the anagram values of two different words will be the same if and only if they are anagrams.

Thus the problem of finding anagrams between the two lists reduces to finding anagram values that appear on both lists. This easily done by sorting each list by anagram number and stepping through to find common values, in nlog(n) times.

回答3:

String to char[]
sort it char[]
generate String from sorted char[]
use it as key to HashMap<String, List<String>>
insert current original String to list of values associated

for example for

car, acr, rca, abc it would have

acr: car, acr, rca
abc: abc

来源：https://stackoverflow.com/questions/19600442/anagram-algorithm-using-a-hashtable-and-or-tries

标签

java

algorithm

anagram