Anagram Algorithm using a hashtable and/or tries

谁说胖子不能爱 提交于 2019-12-04 23:26:48

the most succinct answer due to some guy quoted in the "programming pearls" book is (paraphrasing):

"sort it this way (waves hand horizontally left to right), and then that way (waves hand vertically top to bottom)"

this means, starting from a one-column table (word), create a two column table: (sorted_word, word), then sort it on the first column.

now to find anagrams of a word, first compute sorted word and do a binary search for its first occurrence in the first column of the table, and read off the second column values while the first column is the same.

input (does not need to be sorted):

mate
tame
mote
team
tome

sorted "this way" (horizontally):

aemt, mate
aemt, tame
emot, mote
aemt, team
emot, tome

sorted "that way" (vertically):

aemt, mate
aemt, tame
aemt, team
emot, mote
emot, tome

lookup "team" -> "aemt"

aemt, mate
aemt, tame
aemt, team

As far as hashtables/tries they only come into the picture if you want a slightly speedier lookup. Using hash tables you can partition the 2-column vertically sorted table into k-partitions based on the hash of the first column. this will give you a constant factor speedup because you have to do a binary search only within one partition. tries are a different way of optimizing by helping you avoid doing too many string comparisons, you hang off the index of the first row for the appropriate section of the table for each terminal in the trie.

Hash tables are not the best solution, so I doubt you would be required to use them.

The simplest approach to finding anagram pairs (that I know of) is as follows:

Map characters as follows:

a -> 2 b -> 3 c -> 5 d -> 7

and so on, such that letters a..z are mapped to the first 26 primes.

Multiply the character values for each character in the word, lets call it the "anagram number". Its pretty easy to see TEAM and TAME will produce the same number. Indeed the anagram values of two different words will be the same if and only if they are anagrams.

Thus the problem of finding anagrams between the two lists reduces to finding anagram values that appear on both lists. This easily done by sorting each list by anagram number and stepping through to find common values, in nlog(n) times.

  • String to char[]
  • sort it char[]
  • generate String from sorted char[]
  • use it as key to HashMap<String, List<String>>
  • insert current original String to list of values associated

for example for

car, acr, rca, abc it would have

acr: car, acr, rca
abc: abc
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!