Puzzle: Need an example of a “complicated” equivalence relation / partitioning that disallows sorting and/or hashing

问题

From the question "Is partitioning easier than sorting?":

Suppose I have a list of items and an equivalence relation on them, and comparing two items takes constant time. I want to return a partition of the items, e.g. a list of linked lists, each containing all equivalent items.

One way of doing this is to extend the equivalence to an ordering on the items and order them (with a sorting algorithm); then all equivalent items will be adjacent.

(Keep in mind the distinction between equality and equivalence.)

Clearly the equivalence relation must be considered when designing the ordering algorithm. For example, if the equivalence relation is "people born in the same year are equivalent", then sorting based on the person's name is not appropriate.

Can you suggest a datatype and equivalence relation such that it is not possible to create an ordering?
How about a datatype and equivalence relation where it is possible to create such an ordering, but it is not possible to define a hash function on the datatype that will map equivalent items to the same hash value.

(Note: it is OK if nonequivalent items map to the same hash value (collide) -- I'm not asking to solve the collision problem -- but on the other hand, hashFunc(item) { return 1; } is cheating.)

My suspicion is that for any datatype/equivalence pair where it is possible to define an ordering, it will also be possible to define a suitable hash function, and they will have similar algorithmic complexity. A counterexample to that conjecture would be enlightening!

回答1:

The answer to questions 1 and 2 is no, in the following sense: given a computable equivalence relation ≡ on strings {0, 1}^*, there exists a computable function f such that x ≡ y if and only if f(x) = f(y), which leads to an order/hash function. One definition of f(x) is simple, and very slow to compute: enumerate {0, 1}^* in lexicographic order (ε, 0, 1, 00, 01, 10, 11, 000, …) and return the first string equivalent to x. We are guaranteed to terminate when we reach x, so this algorithm always halts.

回答2:

Creating a hash function and an ordering may be expensive but will usually be possible. One trick is to represent an equivalence class by a pre-arranged member of that class, for instance, the member whose serialised representation is smallest, when considered as a bit string. When somebody hands you a member of an equivalence class, map it to this canonicalised member of that class, and then hash or compare the bit string representation of that member. See e.g. http://en.wikipedia.org/wiki/Canonical#Mathematics

Examples where this is not possible or convenient include when somebody gives you a pointer to an object that implements equals() but nothing else useful, and you do not get to break the type system to look inside the object, and when you get the results of a survey that only asks people to judge equality between objects. Also Kruskal's algorithm uses Union&Find internally to process equivalence relations, so presumbly for this particular application nothing more cost-effective has been found.

回答3:

One example that seems to fit your request is an IEEE floating point type. In particular, a NaN doesn't compare as equivalent to anything else (nor even to itself) unless you take special steps to detect that it's a NaN, and always call that equivalent.

Likewise for hashing. If memory serves, any floating point number with all bits of the significand set to 0 is treated as having the value 0.0, regardless of what the bits in the exponent are set to. I could be remembering that a bit wrong, but the idea is the same in any case -- the right bit pattern in one part of the number means that it has the value 0.0, regardless of the bits in the rest. Unless your hash function takes this into account, it will produce different hash values for numbers that really compare precisely equal.

回答4:

As you probably know, comparison-based sorting takes at least O(n log n) time (more formally you would say it is Omega(n log n)). If you know that there are fewer than log2(n) equivalence classes, then partitioning is faster, since you only need to check equivalence with a single member of each equivalence class to determine which part in the partition you should assign a given element to.

I.e. your algorithm could be like this:

For each x in our input set X:
    For each equivalence class Y seen so far:
        Choose any member y of Y.
        If x is equivalent to y:
            Add x to Y.
            Resume the outer loop with the next x in X.

    If we get to here then x is not in any of the equiv. classes seen so far.
    Create a new equivalence class with x as its sole member.

If there are m equivalence classes, the inner loop runs at most m times, taking O(nm) time overall. As ShreetvatsaR observes in a comment, there can be at most n equivalence classes, so this is O(n^2). Note this works even if there is not a total ordering on X.

回答5:

Theoretically, it is alway possible (for questions 1 and 2), because of the Well Ordering Theorem, even when you have an uncountable number of partitions.

Even if you restrict to computable functions, throwawayaccount's answer answers that.

You need to more precisely define your question :-)

In any case,

Practically speaking,

Consider the following:

You data type is the set of unsigned integer arrays. The ordering is lexicographic comparison.

You could consider hash(x) = x, but I suppose that is cheating too :-)

I would say (but haven't thought more about getting a hash function, so might well be wrong) that partitioning by ordering is much more practical than partitioning by hashing, as hashing itself could become impractical. (A hashing function exists, no doubt).

回答6:

I believe that...

1- Can you suggest a datatype and equivalence relation such that it is not possible to create an ordering?

...it's possible only for infinite (possibly only for non-countable) sets.

2- How about a datatype and equivalence relation where it is possible to create such an ordering, but it is not possible to define a hash function on the datatype that will map equivalent items to the same hash value.

...same as above.

回答7:

EDIT: This answer is wrong

I am not going to delete it just because some of the comments below are enlightening

Not every equivalence relationship implies an order

As your equivalence relationship should not induce an order, let´s take an un-ordered distance function as relation.

If we get the set of functions f(x):R -> R as our datatype, and define an equivalence relation as:

f is equivalent to g if  f(g(x)) = g(f(x)  [commuting Operators][1]

Then you can't sort on that order (no injective function exists with the Real numbers). You just can't find a function which maps your datatype to numbers due to the cardinality of the function's space.

回答8:

Suppose that F(X) is a function which maps an element of some data type T to another of the same type, such that for any Y of type T, there is exactly one X of type T such that F(X)=Y. Suppose further that the function is chosen so that there is generally no practical way of finding the X in the above equation for a given Y.

Define F0=X, F{1}(X)=F(X), F{2}(X)=F(F(X)), etc. so F{n}(X) = F(F{n-1}(X)).

Now define a data type Q containing a positive integer K and an object X of type T. Define an equivalence relation thus:

Q(a,X) vs Q(b,Y):

If a > b, the items are equal iff F{a-b}(Y)==X

If a < b, the items are equal iff F{b-a}(X)==Y

If a=b, the items are equal iff X==Y

For any given object Q(a,X) there exists exactly one Z for F{a}(Z)==X. Two objects are equivalent iif they would have the same Z. One could define an ordering or hash function based upon Z. On the other hand, if F is chosen such that its inverse cannot be practically computed, the only practical way to compare elements may be to use the equivalence function above. I know of no way to define an ordering or hash function without either knowing the largest possible "a" value an item could have, or having a means to invert function F.

来源：https://stackoverflow.com/questions/3261782/puzzle-need-an-example-of-a-complicated-equivalence-relation-partitioning-t

标签

algorithm

sorting

puzzle

partitioning