Find if 2 strings are anagram in O(1) space and O(n) time

安稳与你 提交于 2019-12-05 16:28:29

Absolutely no expert here...

But why not go through each string and simply count how many times each letter turns up.

Given appropriate implementation, this shouldn't take more than O(n) time.

generate a prime number array[26] each prime number represent a character, then when you traverse the string, multiple each character's prime number, if equal, it is anagrams, otherwise not. it takes O(n) and constant space


There are couple of ways to solve it.

Method 1 - Using custom hash code function
We can have hashCode function like:

static int[] primes = {3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103};
static String alphabet = "abcdefghijklmnopqrstuvwxyz";


public static int hashCode(String s){
    int sum = 0;

    for(char c: s.toCharArray()){
      sum += primes[c-97];
    }
    return sum;
}

Generate the hash of both strings, and if the hashCode are equal strings are anagrams. This method is similar to solution mentioned by Jin, as it is in some way generating hashCode for string.
Time complexity - O(n)

Method 2 - Use hashmap of Character and Integer
Consider 2 strings as 2 arrays of character. Traverse first array, add the character to hashmap of char and count, increment the count when you find the character. Likewise traverse through second array, decrement the counter in the hashmap, or if you dont find the character, they are not anagrams. Finally, when map has all the characters and count as 0, then again 2 strings are anagrams.

Method 3 - Use a count array(my favourite)


boolean are_anagrams(string1, string2){
 
    let counts = new int[26];
 
    for each char c in lower_case(string1)
        counts[(int)c]++
 
    for each char c in lower_case(string2)
        counts[(int)c]--
 
    for each int count in counts
        if count != 0
            return false
 
    return true
}

You can get all the codes here.

Yes, use a hash and count occurences. If at the end, we have a non-zero figure, then the strings are not anagrams.

let h => hash which maps letters to occurence_count (initialized to 0)

for each letter l in string a
  h[l] = h[l] + 1
end

for each letter l in string b
  h[l] = h[l] - 1
end

for each key l in h 
  return false if h[l] != 0
end

return true

This will run in O(n) + O(n) + c = O(n). Our hash contains 26-letter spots, each with an integer associated with it. The space is therefore O(26) = O(1)

[[Edit]], same as above, but with time-analysis annotations:

let h => hash which maps letters to occurence_count (initialized to 0)

#this loop runs n times
for each letter l in string a
  #hash lookups / writes are constant time
  h[l] = h[l] + 1
end
#above function ran O(n) time

for each letter l in string b
  h[l] = h[l] - 1
end

#runs in O(alphabet) = O(c) = constant-time
for each key l in h 
  return false if h[l] != 0
end

return true
user1702733

Run in : O(n) + O(n) = O(n)

Fix Used Space : O(256) = O(1)

Here is code in Java

private static boolean isAnagramWithOneArray(String strFirst, String strSecond) {
    int[] charsCount = new int[256];

    if (strFirst != null && strSecond != null) {
        if (strFirst.length() != strSecond.length()) {
            return false;
        }
        for (int i = 0; i < strFirst.length(); i++) {
            charsCount[strFirst.charAt(i)]++;
            charsCount[strSecond.charAt(i)]--;
        }
        for (int i = 0; i < charsCount.length; i++) {
            if (charsCount[i] != 0) {
                return false;
            }
        }
        return true;
    } else {
        return (strFirst == null && strSecond == null);
    }
}
unsigned char CharacterCheck(char item)
{

    if ((item >= 'A') && (item <= 'Z'))
        return (item - 'A');

    if ((item >= 'a') && (item <= 'z'))
        return ((item - ('a' - 'A')) - 'A');

    return -1;

}

unsigned char AnagramCheck6 (char * item1, char * item2)
{
    char *test                      = item1;
    char *test2                     = item2;
    int count                       = 0;
    unsigned char rc                = 0;
    unsigned char rslt              = 0;

    while (*test && *test2)
    {
        rslt = CharacterCheck(*test++);

        if (rslt != 0xff)
            count += rslt;

        rslt = CharacterCheck(*test2++);

        if (rslt != 0xff)
            count -= rslt;
    }

    if (*test)
    {
        while (*test)
        {
            rslt = CharacterCheck(*test++);

            if (rslt != 0xff)
                count += rslt;
        }
    }

    if (*test2)
    {
        while (*test2)
        {
            rslt = CharacterCheck(*test2++);

            if (rslt != 0xff)
                count -= rslt;
        }
    }

    if (count)
        rc = 1;

    return rc;

}

The following snippet checks for the correct character and converts case if needed. The second and third checks takes into account if the strings are different lengths

If you transform a words characters in sorted order and hash the String. Every String which has the same hash after sorting will be an anagram(very probable, there is always a chance of collisions) of the other.

the above code would fail to work in all condition

rateher we can go for a quick sort and compare the array while elimating spaces

All suggestions here tend to use the same approach of sorting the input strings and then comparing the results. Being mostly interested in regular ascii letters this can be optimized by count sorting which seems to be most answerers approach. Count sort can do sorting of a limited alphabet of numbers / integers in O(n) so technically it is correct answers. If we have to account for the time to traverse the count array afterwards it will include the time for the alphabet, making O(m+n) a somewhat more correct upper bound in cases where the alphabet is UTF-32.

I tend to think the most generally correct approach would require O(n lg n) since a quicksort might prove faster in real time in case the alphabet cannot be limited sufficiently.

i would do it something as below:

//is s an anagram of t?
#include <string>

bool is_anagram(const string& s, const string& t)
    {
    bool ret = false;

    //if s and t are anagrams, they must of same size
    if( s.length() != t.length() )
        {
        return ret;
        }

        //assume that s and t have valid ascii characters only
    char letters[ 256 ] = {0};
    int i;

    // find occurence of each character in string s
    for( i = 0; i < s.length(); i++ )
        {
        (letters[ s[i] ])++;
        }

    // now, find occurence of each character in string t, but decrement 
    // correspnding character
    for( i = 0; i < t.length(); i++ )
        {
        //if the count is <0 means the given character is not present in s
        if( --(letters[ t[i] ]) < 0 ) 
            {
            return ret;
            }
        }

    //out of for loop means success
    ret = true;
    return ret;
    }

Maybe something like:

    String str1 = "test";
    String str2 = "tzet";
    int count = 0;
    for (int i = 0; i < str1.length(); i++)
    {
        count = count + str1.charAt(i) - str2.charAt(i);
    }
    System.out.println(count);

Subtract every character from string 2 and add every character from string 1 to count (assuming ASCII characters). If they are anagrams, count will be equal to zero.

This doesn't account for anagrams that have inserted spaces, though.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!