Given a set of words, we need to find the anagram words and display each category alone using the best algorithm.
input:
man car kile arc none like
<
I will generate the hasmap based on the sample word and the rest of the alphabets I won't care.
For example if the word is "car" my hash table will be like this: a,0 b,MAX c,1 d,MAX e,MAX ... .. r,2 . As a result any has greater than 3 will consider as not matching
(more tuning...) And my comparison method will compare the hash total within the hash calculation itself. It won't continue once it can identify the word is not equal.
public static HashMap<String, Integer> getHashMap(String word) {
HashMap<String, Integer> map = new HashMap<String, Integer>();
String[] chars = word.split("");
int index = 0;
for (String c : chars) {
map.put(c, index);
index++;
}
return map;
}
public static int alphaHash(String word, int base,
HashMap<String, Integer> map) {
String[] chars = word.split("");
int result = 0;
for (String c : chars) {
if (c.length() <= 0 || c.equals(null)) {
continue;
}
int index = 0;
if (map.containsKey(c)) {
index = map.get(c);
} else {
index = Integer.MAX_VALUE;
}
result += index;
if (result > base) {
return result;
}
}
return result;
}
Main method
HashMap<String, Integer> map = getHashMap(sample);
int sampleHash = alphaHash(sample, Integer.MAX_VALUE, map);
for (String s : args) {
if (sampleHash == alphaHash(s, sampleHash, map)) {
System.out.print(s + " ");
}
}
I don't think you'll find anything better than a hash table with a custom hash function (that would sort the letters of he word before hashing it).
Sum of the letters will never work, because you can't really make 'ac' and 'bb' different.
I wouldn't use hashing since it adds additional complexity for look-up and adds. Hashing, sorting and multiplications are all going to be slower than a simple array-based histogram solution with tracking uniques. Worst case is O(2n):
// structured for clarity
static bool isAnagram(String s1, String s2)
{
int[] histogram = new int[256];
int uniques = 0;
// scan first string
foreach (int c in s1)
{
// count occurrence
int count = ++histogram[c];
// count uniques
if (count == 1)
{
++uniques;
}
}
// scan second string
foreach (int c in s2)
{
// reverse count occurrence
int count = --histogram[c];
// reverse count uniques
if (count == 0)
{
--uniques;
}
else if (count < 0) // trivial reject of longer strings or more occurrences
{
return false;
}
}
// final histogram unique count should be 0
return (uniques == 0);
}
Anagrams can be found in following way:
So I thought through above three validations, we can find anagrams. Correct me if I'm wrong.
Example: abc cba
Length of both words is 3.
Sum of individual characters for both words is 294.
Prod of individual characters for both words is 941094.
Assign a unique prime number to the letters a-z
Iterate your word array, creating a product of primes based on the letters in each word.
Store that product in your word list, with the corresponding word.
Sort the array, ascending by the product.
Iterate the array, doing a control break at every product change.
Don't bother with a custom hash function at all. Use the normal string hash function on whatever your platform is. The important thing is to make the key for your hash table the idea of a "sorted word" - where the word is sorted by letter, so "car" => "acr". All anagrams will have the same "sorted word".
Just have a hash from "sorted word" to "list of words for that sorted word". In LINQ this is incredibly easy:
using System;
using System.Collections.Generic;
using System.Linq;
class FindAnagrams
{
static void Main(string[] args)
{
var lookup = args.ToLookup(word => SortLetters(word));
foreach (var entry in lookup)
{
foreach (var word in entry)
{
Console.Write(word);
Console.Write(" ");
}
Console.WriteLine();
}
}
static string SortLetters(string original)
{
char[] letters = original.ToCharArray();
Array.Sort(letters);
return new string(letters);
}
}
Sample use:
c:\Users\Jon\Test>FindAnagrams.exe man car kile arc none like
man
car arc
kile like
none