Inserting multiple not-a-numbers into a std::unordered_set<double>

时间秒杀一切 提交于 2019-12-22 07:05:20

问题


One of consequences of the IEEE 754 standard is the non-intuitive behavior of std::unordered_set<double>, when not-a-number elements (NANs) are inserted.

Due to the fact that NAN!=NAN, after the following sequence:

#include <iostream>
#include <cmath>
#include <unordered_set>

int main(){
    std::unordered_set<double> set;
    set.insert(NAN);
    set.insert(NAN);
    std::cout<<"Number of elements "<<set.size()<<"\n";  //there are 2 elements!
}

there are two elements in the set(see it live): NAN and NAN!

Mine main issue with this is, that when N NANs are inserted into the hash-set, they all hit the same hash-bucket and the performance of N inserts into the hash-set degenerates to the worst-case running time - O(N^2).

For an example, see the listing at the end of the question or here live - inserting NAN takes some order of magnitude more time than a "normal" floating number.

My question: is it possible (and if yes - how) to tweak std::unordered_set<double> in such a way, that there is at most one NAN-element in the set, no matter the flavor of inserted NANs (NAN, -NAN and so on)?


Listing:

#include <iostream>
#include <cmath>
#include <unordered_set>
#include <chrono>

constexpr int N=5000;
void test_insert(double value)
{
    std::unordered_set<double> s;
    auto begin = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < N; i++) {
        s.insert(value);
    }
    auto end = std::chrono::high_resolution_clock::now();
    std::cout << "Duration: " << (std::chrono::duration_cast<std::chrono::nanoseconds>(end - begin).count() / 1e9) << "\n";
    std::cout << "Number of elements: "<<s.size()<<"\n";
}

int main(){
    std::cout << "Not NAN\n";
    test_insert(1.0);           //takes 0.0001 s
    std::cout << "NAN\n";
    test_insert(NAN);           //takes 0.2 s
}

回答1:


From your comment in Andrews answer,

I think the problem with this solution: -NAN will have a different hash-value then NAN but for hash-function h must hold: if x==y then also h(x)==h(y)

This does hash differently, so you need to also define your own hash function if you want h(-NAN) == h(NAN) ...

(augmented from @Andrew answer)

#include <iostream>
#include <cmath>
#include <unordered_set>
#include <chrono>

struct EqualPred
{
    constexpr bool operator()(const double& lhs, const double& rhs) const
    {
        if (std::isnan(lhs) && std::isnan(rhs)) return true;
        return lhs == rhs;
    }
};

template <typename T>
struct Hash
{
    size_t operator()(const T& value) const
    {
        return  std::hash<T>()( std::isnan(value) ? NAN : value);
    }


};

std::unordered_set<double, Hash<double>, EqualPred> s;

constexpr int N=5000;
void test_insert(double value)
{

    auto begin = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < N; i++) {
        s.insert(value);
    }
    auto end = std::chrono::high_resolution_clock::now();
    std::cout << "Duration: " << (std::chrono::duration_cast<std::chrono::nanoseconds>(end - begin).count() / 1e9) << "\n";
    std::cout << "Number of elements: "<<s.size()<<"\n";
}

int main(){
    std::cout << "Not NAN\n";
    test_insert(1.0);           //takes 0.0001 s
    std::cout << "NAN\n";
    test_insert(NAN);  
    test_insert(-NAN);

    std::cout << s.size() << std::endl;
    //takes 0.2 s
}

Demo




回答2:


You can define your own predicate to compare the keys, and ensure NaNs compare equal for this purpose. This can be supplied as the third parameter to the std::unordered_set class template.

See the definition of EqualPred below (code copied from question and modified), and its use in declaring the unordered_set variable. I took the default for the second parameter from the documentation at https://en.cppreference.com/w/cpp/container/unordered_set

Live demo: http://coliru.stacked-crooked.com/a/7085936431e6698f

#include <iostream>
#include <cmath>
#include <unordered_set>
#include <chrono>

struct EqualPred
{
    constexpr bool operator()(const double& lhs, const double& rhs) const
    {
        if (std::isnan(lhs) && std::isnan(rhs)) return true;
        return lhs == rhs;
    }
};

constexpr int N=5000;
void test_insert(double value)
{
    std::unordered_set<double, std::hash<double>, EqualPred> s;
    auto begin = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < N; i++) {
        s.insert(value);
    }
    auto end = std::chrono::high_resolution_clock::now();
    std::cout << "Duration: " << (std::chrono::duration_cast<std::chrono::nanoseconds>(end - begin).count() / 1e9) << "\n";
    std::cout << "Number of elements: "<<s.size()<<"\n";
}

int main(){
    std::cout << "Not NAN\n";
    test_insert(1.0);           //takes 0.0001 s
    std::cout << "NAN\n";
    test_insert(NAN);           //takes 0.2 s
}

It is worth noting (thanks to @ead's comment) that -NaN and +NaN may hash to different values. If you want to handle these as identical, you'll need to provide a different implementation of the second template parameter, the hash function. This should detect any NaNs and hash the same NaN each time.



来源:https://stackoverflow.com/questions/51042495/inserting-multiple-not-a-numbers-into-a-stdunordered-setdouble

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!