问题
One of consequences of the IEEE 754 standard is the non-intuitive behavior of std::unordered_set<double>
, when not-a-number elements (NAN
s) are inserted.
Due to the fact that NAN!=NAN
, after the following sequence:
#include <iostream>
#include <cmath>
#include <unordered_set>
int main(){
std::unordered_set<double> set;
set.insert(NAN);
set.insert(NAN);
std::cout<<"Number of elements "<<set.size()<<"\n"; //there are 2 elements!
}
there are two elements in the set
(see it live): NAN
and NAN
!
Mine main issue with this is, that when N
NAN
s are inserted into the hash-set, they all hit the same hash-bucket and the performance of N
inserts into the hash-set degenerates to the worst-case running time - O(N^2)
.
For an example, see the listing at the end of the question or here live - inserting NAN
takes some order of magnitude more time than a "normal" floating number.
My question: is it possible (and if yes - how) to tweak std::unordered_set<double>
in such a way, that there is at most one NAN
-element in the set, no matter the flavor of inserted NAN
s (NAN
, -NAN
and so on)?
Listing:
#include <iostream>
#include <cmath>
#include <unordered_set>
#include <chrono>
constexpr int N=5000;
void test_insert(double value)
{
std::unordered_set<double> s;
auto begin = std::chrono::high_resolution_clock::now();
for (int i = 0; i < N; i++) {
s.insert(value);
}
auto end = std::chrono::high_resolution_clock::now();
std::cout << "Duration: " << (std::chrono::duration_cast<std::chrono::nanoseconds>(end - begin).count() / 1e9) << "\n";
std::cout << "Number of elements: "<<s.size()<<"\n";
}
int main(){
std::cout << "Not NAN\n";
test_insert(1.0); //takes 0.0001 s
std::cout << "NAN\n";
test_insert(NAN); //takes 0.2 s
}
回答1:
From your comment in Andrews answer,
I think the problem with this solution: -NAN will have a different hash-value then NAN but for hash-function h must hold: if x==y then also h(x)==h(y)
This does hash differently, so you need to also define your own hash function if you want h(-NAN) == h(NAN)
...
(augmented from @Andrew answer)
#include <iostream>
#include <cmath>
#include <unordered_set>
#include <chrono>
struct EqualPred
{
constexpr bool operator()(const double& lhs, const double& rhs) const
{
if (std::isnan(lhs) && std::isnan(rhs)) return true;
return lhs == rhs;
}
};
template <typename T>
struct Hash
{
size_t operator()(const T& value) const
{
return std::hash<T>()( std::isnan(value) ? NAN : value);
}
};
std::unordered_set<double, Hash<double>, EqualPred> s;
constexpr int N=5000;
void test_insert(double value)
{
auto begin = std::chrono::high_resolution_clock::now();
for (int i = 0; i < N; i++) {
s.insert(value);
}
auto end = std::chrono::high_resolution_clock::now();
std::cout << "Duration: " << (std::chrono::duration_cast<std::chrono::nanoseconds>(end - begin).count() / 1e9) << "\n";
std::cout << "Number of elements: "<<s.size()<<"\n";
}
int main(){
std::cout << "Not NAN\n";
test_insert(1.0); //takes 0.0001 s
std::cout << "NAN\n";
test_insert(NAN);
test_insert(-NAN);
std::cout << s.size() << std::endl;
//takes 0.2 s
}
Demo
回答2:
You can define your own predicate to compare the keys, and ensure NaNs compare equal for this purpose. This can be supplied as the third parameter to the std::unordered_set
class template.
See the definition of EqualPred
below (code copied from question and modified), and its use in declaring the unordered_set
variable. I took the default for the second parameter from the documentation at https://en.cppreference.com/w/cpp/container/unordered_set
Live demo: http://coliru.stacked-crooked.com/a/7085936431e6698f
#include <iostream>
#include <cmath>
#include <unordered_set>
#include <chrono>
struct EqualPred
{
constexpr bool operator()(const double& lhs, const double& rhs) const
{
if (std::isnan(lhs) && std::isnan(rhs)) return true;
return lhs == rhs;
}
};
constexpr int N=5000;
void test_insert(double value)
{
std::unordered_set<double, std::hash<double>, EqualPred> s;
auto begin = std::chrono::high_resolution_clock::now();
for (int i = 0; i < N; i++) {
s.insert(value);
}
auto end = std::chrono::high_resolution_clock::now();
std::cout << "Duration: " << (std::chrono::duration_cast<std::chrono::nanoseconds>(end - begin).count() / 1e9) << "\n";
std::cout << "Number of elements: "<<s.size()<<"\n";
}
int main(){
std::cout << "Not NAN\n";
test_insert(1.0); //takes 0.0001 s
std::cout << "NAN\n";
test_insert(NAN); //takes 0.2 s
}
It is worth noting (thanks to @ead's comment) that -NaN
and +NaN
may hash to different values. If you want to handle these as identical, you'll need to provide a different implementation of the second template parameter, the hash function. This should detect any NaNs and hash the same NaN each time.
来源:https://stackoverflow.com/questions/51042495/inserting-multiple-not-a-numbers-into-a-stdunordered-setdouble