Determine if A is permutation of B using ASCII values

问题

I wrote an function to determine if string a is a permutation of string b. The definition is as follows:

bool isPermutation(std::string a, std::string b){
    if(a.length() != b.length())
        return false;
    int a_sum, b_sum;
    a_sum = b_sum = 0;
    for(int i = 0; i < a.length(); ++i){
        a_sum += a.at(i);
        b_sum += b.at(i);
    }
    return a_sum == b_sum;
}

The issue with my approach is that if a = 600000 and b = 111111, the function returns true.

Is there any way I can keep my general approach to this problem (as opposed to sorting the strings then doing strcmp) and maintain correctness?

回答1:

You can count characters separately:

bool isPermutation(std::string a, std::string b)
{
    if(a.length() != b.length())
        return false;

    assert(a.length() <= INT_MAX);
    assert(b.length() <= INT_MAX);

    int counts[256] = {};
    for (unsigned char ch : a)
        ++counts[ch];
    for (unsigned char ch : b)
        --counts[ch];
    for (int count : counts)
        if (count)
            return false;

    return true;
}

回答2:

A simple approach if you don't need UTF-8 support

The solution to this problem is surprisingly easy. There is a function in standard library handling this.

Assume that a and b are two strings:

return is_permutation(a.begin(), a.end(), b.begin(), b.end());

Or, if you don't have access to C++14 yet:

return a.size() == b.size() && is_permutation(a.begin(), a.end(), b.begin());

Note though the complexity of this is only guaranteed to be no worse than quadratic in the size of the string. So, if this matters, sorting both strings could indeed be a better solution:

string aa(a); sort(aa.begin(), aa.end());
string bb(b); sort(bb.begin(), bb.end());
return (aa == bb);

And if this is also to slow, use John Zwinck's answer above, which is linear in complexity.

Link to the documentation for is_permutation: http://en.cppreference.com/w/cpp/algorithm/is_permutation

Link to the documentation for sort: http://en.cppreference.com/w/cpp/algorithm/sort

A (little) more complex approach if UTF-8 support is required

The above may fail on UTF-8 strings. The issue here is that UTF-8 is a multibyte character encoding, that is, a single character may be encoded in multiple char variables. None of the approaches mentioned above are aware of this, and all assume that a single character is also a sigle char variable. An example of two UTF-8 strings were these approaches fail is here: http://ideone.com/erfNmC

The solution may be to temporarily copy our UTF-8 string to a fixed-length UTF-32 encoded string. Assume that a and b are two UTF-8 encoded strings:

u32string a32 = wstring_convert<codecvt_utf8<char32_t>, char32_t>{}.from_bytes(a);
u32string b32 = wstring_convert<codecvt_utf8<char32_t>, char32_t>{}.from_bytes(b);

Then you can correctly use the aforemented functions on those UTF-32 encoded strings:

return is_permutation(a32.begin(), a32.end(), b32.begin(), b32.end()) << '\n';

or:

sort(a32.begin(), a32.end());
sort(b32.begin(), b32.end());
return (aa == bb);

The downside is that now John Zwinck's approach becomes a little bit less practical. You'd have to declare the array for 1114112 elements, as this is how many possible Unicode characters actually exist.

More about conversions to UTF-32: http://en.cppreference.com/w/cpp/locale/wstring_convert/from_bytes

回答3:

std::sort( strOne.begin(), strOne.end() );
std::sort( strTwo.begin(), strTwo.end() );    
return strOne == strTwo;

will be sufficient.

My suggestion is to use std::unordered_map

i.e.

std::unordered_map< char, unsigned > umapOne;
std::unordered_map< char, unsigned > umapTwo;
for( char c : strOne ) ++umapOne[c];
for( char c : strTwo ) ++umapTwo[c];
return umapOne == umapTwo;

As an optimization you can add at the top for a solution

if( strOne.size() != strTwo.size() ) return false;

Better std::unordered_map solution,

if( strOne.size() != strTwo.size() ) return false; // required
std::unordered_map< char, int > umap;
for( char c : strOne ) ++umap[c];
for( char c : strTwo ) if( --umap[c] < 0 )  return false;
return true;

If you need to just solve a problem without knowing how to do it, you may use std::is_permutation

return std::is_permutation( strOne.begin(), strOne.end(), strTwo.begin(), strTwo.end() );

来源：https://stackoverflow.com/questions/36818877/determine-if-a-is-permutation-of-b-using-ascii-values

标签

c++

c++11

permutation