O(N) Identification of Permutations

问题

This answer determines if two strings are permutations by comparing their contents. If they contain the same number of each character, they are obviously permutations. This is accomplished in O(N) time.

I don't like the answer though because it reinvents what is_permutation is designed to do. That said, is_permutation has a complexity of:

At most O(N²) applications of the predicate, or exactly N if the sequences are already equal, where N=std::distance(first1, last1)

So I cannot advocate the use of is_permutation where it is orders of magnitude slower than a hand-spun algorithm. But surely the implementer of the standard would not miss such an obvious improvement? So why is is_permutation O(N²)?

回答1:

It was I who wrote that answer.

When the string's value_type is char, the number of elements required in a lookup table is 256. For a two-byte encoding, 65536. For a four-byte encoding, the lookup table would have just over 4 billion entries, at a likely size of 16 GB! And most of it would be unused.

So the first thing is to recognize that even if we restrict the types to char and wchar_t, it may still be untenable. Likewise if we want to do is_permutation on sequences of type int.

We could have a specialization of std::is_permutation<> for integral types of size 1 or 2 bytes. But this is somewhat reminiscent of std::vector<bool> which not everyone thinks was a good idea in retrospect.

We could also use a lookup table based on std::map<T, size_t>, but this is likely to be allocation-heavy so it might not be a performance win (or at least, not always). It might be worth implementing one for a detailed comparison though.

In summary, I don't fault the C++ standard for not including a high-performance version of is_permutation for char. First because in the real world I'm not sure it's the most common use of the template, and second because the STL is not the be-all and end-all of algorithms, especially where domain knowledge can be used to accelerate computation for special cases.

If it turns out that is_permutation for char is quite common in the wild, C++ library implementors would be within their rights to provide a specialization for it.

回答2:

is_permutation works on almost any data type. The algorithm in your link works for data types with a small number of values only.

It's the same reason why std::sort is O(N log N) but counting sort is O(N).

回答3:

The answer you cite works on chars. It assumes they are 8 bit (not necessarily the case) and so there are only 256 possibilities for each value, and that you can cheaply go from each value to a numeric index to use for a lookup table of counts (for char in this case, the value and the index are the same thing!)

It generates a count of how many times each char value occurs in each string; then, if these distributions are the same for both strings, the strings are permutations of each other.

What is the time complexity?

you have to walk each character of each string, so M+N steps for two inputs of lengths M and N
each of these steps involves incrementing an count in a fixed size table at an index given by the char, so is constant time

So the overall time complexity is O(N+M): linear, as you describe.

Now, std::is_permutation makes no such assumptions about its input. It doesn't know that there are only 256 possibilities, or indeed that they are bounded at all. It doesn't know how to go from an input value to a number it can use as an index, never mind how to do that in constant time. The only thing it knows is how to compare two values for equality, because the caller supplies that information.

So, the time complexity:

we know it has to consider each element of each input at some point
we know that, for each element it hasn't seen before (I'll leave discussion of how that's determined and why that doesn't impact the big O complexity as an exercise), it's not able to turn the element into any kind of index or key for a table of counts, so it has no way of counting how many occurrences of that element exist which is better than a linear walk through both inputs to see how many elements match

so the complexity is going to be quadratic at best.

来源：https://stackoverflow.com/questions/36865275/on-identification-of-permutations

标签

c++

big-o

permutation

string-comparison

standard-library