Are IEEE floats valid key types for std::map and std::set?

时间秒杀一切 提交于 2019-11-27 04:25:42

问题


Background

The requirement for a comparator on the key type of an associative container (for example std::map) is that it imposes a strict weak order on the elements of the key type.

For a given comparator comp(x, y) we define equiv(x, y) = !comp(x, y) && !comp(y, x).
The requirements for comp(x, y) being a strict weak order are

  1. Irreflexibility (!comp(x, x) for all x)
  2. Transitivity of the ordering (if comp(a, b) and comp(b, c) then comp(a, c)).
  3. Transitivity of equivalence (if equiv(a, b) and equiv(b, c) then equiv(a, c))

std::less<float> (the default comparator) uses operator<, which does not create a strict weak order because of NaN. Because x < NaN and NaN < x are false for all x, NaN is equivalent to all floats under this comparator, this breaks condition #3: equiv(1.0, NaN) and equiv(NaN, 2.0) but not equiv(1.0, 2.0). For IEEE floats except NaN, it is a strict weak order (where each number has its own equivalence class except for 0 and -0).

The question

Does this mean that one is not allowed by the C++ standard to use IEEE floats (and (long) doubles) as a key type in an associative container because of the above issue, even if I make sure that NaN never gets inserted into the container? I'm not quite sure about the "elements of Key" wording in the standard—if it means all possible elements or just the elements that end up in the container.

Note: The question is not about issues wrt. truncation/rounding, I'll likely post a different question regarding that soon.

Update:

Sigh. I should've posed the question without specifying float, I just thought it was a nice example.

The question really is: Is it allowed to use a comparator that only imposes a strict weak order on the elements that get put into the container, not all possible instances of the key type? Please don't just answer "yes" or "no", I'd like some references to the standard / prior discussions about this / an answer from a commitee member or something.


回答1:


I suspect the restrictions should be taken as referring to the relation's behavior on the values actually used as keys, not necessarily on all values of the type. Don't have time at the moment to go through the standard looking for "smoking gun" language that refers to actual container elements rather than all values of the type.

Similar case: what if a comparator (for a container of pointers or smart pointers) calls a virtual function, and somebody links a derived class of the type it compares, which overrides the virtual function in a way that makes the comparator not a strict weak order? Does the program become undefined even if nobody ever actually uses that derived class?

If in doubt, you can support NaN with a comparator that is a strict weak order:

bool operator()(double a, double b) {
    if ((a == a) && (b == b)) {
        return a < b;
    }
    if ((a != a) && (b != b)) return false;
    // We have one NaN and one non-NaN.
    // Let's say NaN is less than everything
    return (a != a)
}

The last two lines "optimize" to return (b == b);, although I'm not sure the comment optimizes with it.

I think Tomalak has convinced me the language does say the whole type needs to be ordered.

This makes little sense, since a map doesn't conjure values out of nowhere, it only uses values that it's given (and copies of them), but the question is about the rules, and them's the rules as far as I know. C++0x is the same. I wonder if there's a defect report, or any point submitting one.

It's also annoying in that on (very rare) systems where std::less is slow for pointers, you can't use < as the comparator in a map of pointers, even if you know that the pointers are all to elements of the same array. Shame.

Another option is to use the following class as the key type, so keys are checked for NaN only on entry to the map, not on every comparison as above.

struct SaneDouble {
    double value;
    SaneDouble (double d) : value(d) {
        if (d != d) throw std::logic_error();
    }
    static friend bool operator<(SaneDouble lhs, SaneDouble rhs) {
        return lhs.value < rhs.value;
    }
    // possibly a conversion to double
};

This raises another question - clearly someone could create a SaneDouble and then set its value to NaN (assuming the implementation lets them get one from somewhere without crashing). So are "elements of SaneDouble" strict-weak-ordered or not? Does my half-hearted attempt to create a class invariant in the constructor make my program undefined even if nobody actually breaks the invariant, simply because they could and therefore the results of doing so are "elements of SaneDouble"? Is it really the intention of the standard that the program's behavior is defined if and only if value is marked private? Does the standard actually define anywhere what "the elements" of a type are?

I wonder whether we should interpret "elements of Key" to mean that the comparator induces a strict weak order on some elements of Key. Presumably including the ones actually used. "I have doughnuts" doesn't mean I have every doughnut. It's a stretch, though.




回答2:


In short: this is fine (in the sense that your question is about).

If you prevent the values (i.e. NaN) that wouldn't satisfy the ordering requirements, then the behaviour is entirely defined.




回答3:


Putting floats as keys of an associative container is sometimes a bad idea since the equality semantics are quite poor. But it depends on what you want to do. Bear in mind that NaN and infinities are usually not a problem. You can handle them with special comparator functions (I usually don't), and obviously the requirements of the standard are about the actual keys that will end up in the container, that you can see as a subset of the key type. A map implementation will never introduce key instances that you did not feed yourself into the map.

I used once this predicate for a map where I could disallow two very close values:

struct FuzzyComparer
{
    template <typename T>
    bool operator()(T x, T y) const
    {
        static const T oneMinusEps = (T)1. - 64 * std::numeric_limits<T>::epsilon();
        return x / y < oneMinusEps;
    }
};

This does not provide you with a good equivalence relation. This is only useful when you want to store discrete floating point values, and you are ready to tolerate some roundoff error in the computation which yields the key you want to retrieve. For the actual keys that will be inserted, it yields an equivalence relation.

You will not be able to come up with a good equivalence relation on floating point numbers which is compatible with the arithmetic operations, ie. with makes addition and multiplication associative.

You either have to throw away the "equivalence relation" part, which shouldn't be that a big deal in real world code (I doubt the transitivity of the eq. relation is used to an extent that would bother you in a typical map implementation) or throw away the compatibility with arithmetic operations. But what is the point of using floating point values as keys then ?




回答4:


You can put floats and doubles as the key to a std::map or std::set for the purpose of them being correctly sorted.

The issue is when it comes to uniqueness, as you may well get duplicates occurring because of the way floats and doubles are compared.

Using a comparison based with epsilons (close values considered equal) is also not without danger as you could have genuine non-duplicates eliminated for being too close.

The "lookup" may not find an element that is there if you use simple "find", so you may wish to use some kind of epsilon lookup with upper_bound() on x-delta where x is what you are really looking for, and allowing a value that is less than x+delta.

Given all that, there is clearly no problem of using float or double as a key in std::multiset or std::multimap as long as you use upper_bound to search rather than equal_range.

With regards to NaNs, if the set or map is not empty they would be considered "equal" to any element already there and therefore would not insert. If you insert a NaN first, all subsequent insertions should fail.

They would be considered equal because x<NaN and NaN<x both evaluate false.

Of course by the same logic if it's a map and you call

myMap[ NaN ] = x;

it could justifiably modify any element.

(Same if you do find(NaN), which could return any iterator).

Therefore if you are going to have NaNs in any part of this calculation use a special comparator such as Steve Jessop's.



来源:https://stackoverflow.com/questions/4816156/are-ieee-floats-valid-key-types-for-stdmap-and-stdset

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!