Algorithm for grouping anagram words

前端 未结 14 1418
悲&欢浪女
悲&欢浪女 2020-12-07 23:30

Given a set of words, we need to find the anagram words and display each category alone using the best algorithm.

input:

man car kile arc none like
<         


        
相关标签:
14条回答
  • 2020-12-08 00:05

    I will generate the hasmap based on the sample word and the rest of the alphabets I won't care.

    For example if the word is "car" my hash table will be like this: a,0 b,MAX c,1 d,MAX e,MAX ... .. r,2 . As a result any has greater than 3 will consider as not matching

    (more tuning...) And my comparison method will compare the hash total within the hash calculation itself. It won't continue once it can identify the word is not equal.

    public static HashMap<String, Integer> getHashMap(String word) {
            HashMap<String, Integer> map = new HashMap<String, Integer>();
            String[] chars = word.split("");
            int index = 0;
            for (String c : chars) {
                map.put(c, index);
                index++;
            }
            return map;
        }
    
        public static int alphaHash(String word, int base,
                HashMap<String, Integer> map) {
            String[] chars = word.split("");
            int result = 0;
            for (String c : chars) {
                if (c.length() <= 0 || c.equals(null)) {
                    continue;
                }
                int index = 0;
                if (map.containsKey(c)) {
                    index = map.get(c);
                } else {
                    index = Integer.MAX_VALUE;
                }
                result += index;
                if (result > base) {
                    return result;
                }
            }
            return result;
        }
    

    Main method

      HashMap<String, Integer> map = getHashMap(sample);
            int sampleHash = alphaHash(sample, Integer.MAX_VALUE, map);
            for (String s : args) {
                    if (sampleHash == alphaHash(s, sampleHash, map)) {
                        System.out.print(s + " ");
                    }
                }
    
    0 讨论(0)
  • 2020-12-08 00:07

    I don't think you'll find anything better than a hash table with a custom hash function (that would sort the letters of he word before hashing it).

    Sum of the letters will never work, because you can't really make 'ac' and 'bb' different.

    0 讨论(0)
  • 2020-12-08 00:07

    I wouldn't use hashing since it adds additional complexity for look-up and adds. Hashing, sorting and multiplications are all going to be slower than a simple array-based histogram solution with tracking uniques. Worst case is O(2n):

    // structured for clarity
    static bool isAnagram(String s1, String s2)
    {
        int[] histogram = new int[256];
    
        int uniques = 0;
    
        // scan first string
        foreach (int c in s1)
        {
            // count occurrence
            int count = ++histogram[c];
    
            // count uniques
            if (count == 1)
            {
                ++uniques;
            }
        }
    
        // scan second string
        foreach (int c in s2)
        {
            // reverse count occurrence
            int count = --histogram[c];
    
            // reverse count uniques
            if (count == 0)
            {
                --uniques;
            }
            else if (count < 0) // trivial reject of longer strings or more occurrences
            {
                return false;
            }
        }
    
        // final histogram unique count should be 0
        return (uniques == 0);
    }
    
    0 讨论(0)
  • 2020-12-08 00:07

    Anagrams can be found in following way:

    1. Length of word should match.
    2. Perform addition of each character in terms of integer value. This sum will match if you perform same on anagram.
    3. Perform multiplication of each character in terms of integer value. Evaluated value will match if you perform same on anagram.

    So I thought through above three validations, we can find anagrams. Correct me if I'm wrong.


    Example: abc cba

    Length of both words is 3.

    Sum of individual characters for both words is 294.

    Prod of individual characters for both words is 941094.

    0 讨论(0)
  • 2020-12-08 00:10

    Assign a unique prime number to the letters a-z

    Iterate your word array, creating a product of primes based on the letters in each word.
    Store that product in your word list, with the corresponding word.

    Sort the array, ascending by the product.

    Iterate the array, doing a control break at every product change.

    0 讨论(0)
  • 2020-12-08 00:11

    Don't bother with a custom hash function at all. Use the normal string hash function on whatever your platform is. The important thing is to make the key for your hash table the idea of a "sorted word" - where the word is sorted by letter, so "car" => "acr". All anagrams will have the same "sorted word".

    Just have a hash from "sorted word" to "list of words for that sorted word". In LINQ this is incredibly easy:

    using System;
    using System.Collections.Generic;
    using System.Linq;
    
    class FindAnagrams
    {
        static void Main(string[] args)
        {
            var lookup = args.ToLookup(word => SortLetters(word));
    
            foreach (var entry in lookup)
            {
                foreach (var word in entry)
                {
                    Console.Write(word);
                    Console.Write(" ");
                }
                Console.WriteLine();
            }
        }
    
        static string SortLetters(string original)
        {
            char[] letters = original.ToCharArray();
            Array.Sort(letters);
            return new string(letters);
        }
    }
    

    Sample use:

    c:\Users\Jon\Test>FindAnagrams.exe man car kile arc none like
    man
    car arc
    kile like
    none
    
    0 讨论(0)
提交回复
热议问题