问题

1. Description of the problem

I am trying to pick the most appropriate (efficient) container to store unique n-dimensional vectors composed of floating numbers. Solving whole problem, the most important steps (related to the question) involve:

Get a new vector from the external program (during the same run, all vectors have the same dimensionality).
Check (ASAP) if a new point is already in this container:
- if exists - skip a lot of expensive steps and do the other steps;
- if doesn't - insert into a container (ordering in the container is not important) and do the other steps.

In advance, I don't know how many vectors I will have but the maximum number is prescribed in advance and equal = 100000. Moreover, I always get only one new vector per iteration. Thus at the beginning, most of these new vectors are unique and will be inserted in a container, but later is hard to predict in advance. A lot will depend on the definition of unique vectors and tolerance values.

Thus my goal is to choose the right container for this (type of) situation(s).

2. Choosing a right container

I did a bit of review and from what I found in S. Meyers - Effective STL Item 1: Choose your containers with care

Is lookup speed a critical consideration? If so, you’ll want to look at hashed containers (see Item 25), sorted vectors (see Item 23), and the standard associative containers — probably in that order.

and what I have seen in amazing David Moore's flowchart Choosing a Container it looks that all three suggested options from S. Meyers, Item 1 are worthy broader investigation.

2.1 Complexity (based on cppreference.com)

Let's start from very brief look into theoretical complexity of lookup and insertion procedures for all three considered options separate:

For Vectors:
- find_if() - linear O(n)
- push_back() - constant, but causes reallocation if the new size() is greater than the old capacity()
For Sets:
- insert() - Logarithmic in the size of the container, O(log(size())).
For unordered Sets:
- insert() - Average case: O(1), worst case O(size())

3. Benchmarking different containers

In all experiments, I modelled situation with randomly generated 3-dimensional vectors filled with real values from the interval [0,1).

EDIT:

Used Compiler: Apple LLVM version 7.0.2 (clang-700.1.81)

Compiled in release mode with optimization level -O3 and without any optimization levels.

3.1 using unsorted `vector`

First, why unsorted vector? My scenario considerably differs from the one described in S. Meyers - Effective STL Item 23: Consider replacing associative containers with sorted vectors. Therefore I do not see any advantages in this situation to use sorted vectors.

Second, two vectors x and y assumed to be equal if EuclideanDistance2(x,y) < tollerance^2. Taking into account this, my initial (probably prety poor) implementation with vector's is following:

benchmarking part of implementation using vector container:

// create a vector of double arrays (vda)
std::vector<std::array<double, N>> vda;
const double tol = 1e-6;  // set default tolerance
// record start time
auto start = std::chrono::steady_clock::now();
// Generate and insert one hundred thousands new double arrays
for (size_t i = 0; i < 100000; ++i) {
  // Get a new random double array (da)
  std::array<double, N> da = getRandomArray();
  auto pos = std::find_if(vda.begin(), vda.end(),  // range
    [=, &da](const std::array<double, N> &darr) {  // search criterion
    return EuclideanDistance2(darr.begin(), darr.end(), da.begin()) < tol*tol;
    });

  if (pos == vda.end()) {
    vda.push_back(da);  // Insert array
  }
}
// record finish time
auto end = std::chrono::steady_clock::now();
std::chrono::duration<double> diff = end - start;
std::cout << "Time to generate and insert unique elements into vector: "
          << diff.count() << " s\n";
std::cout << "vector's size = " << vda.size() << std::endl;

here random n-dimensional real vectors (N-dimensional arrays) generated:

// return an array of N uniformly distributed random numbers from 0 to 1
std::array<double, N> getRandomArray() {
  // Engines and distributions retain state, thus defined as static
  static std::default_random_engine e;                    // engine
  static std::uniform_real_distribution<double> d(0, 1);  // distribution
  std::array<double, N> ret;
  for (size_t i = 0; i < N; ++i) {
    ret[i] = d(e);
  }
  return ret;
}

and squared Euclidean distance is calculated:

// Return Squared Euclidean Distance
template <typename InputIt1, typename InputIt2>
double EuclideanDistance2(InputIt1 beg1, InputIt1 end1, InputIt2 beg2) {
  double val = 0.0;
  while (beg1 != end1) {
    double dist = (*beg1++) - (*beg2++);
    val += dist*dist;
  }
  return val;
}

3.1.1 Testing vector's performance

In the messy table below I summarise the average execution time of 10 independent runs and the size of the final container depending on different tolerance (eps) values. Smaller tolerance values lead to higher number of unique elements (more insertions), while higher lead to the less number of unique vectors but longer lookups.

| eps | time(s) with -O3 flag/without optimization flags | size |

| 1e-6 | 13.1496 / 111.83 | 100000 |

| 1e-3 | 14.1295 / 114.254 | 99978 |

| 1e-2 | 10.5931 / 90.674 | 82868 |

| 1e-1 | 0.0551718 / 0.462546 | 749 |

From the results, it seems, that the most time-consuming part using vectors is lookup (find_if()).

Edit: Also, it's obvious, that -O3 optimization makes really good job improving vector's performance.

3.2 Using `set`

Benchmarking part of implementation using set container:

// create a set of double arrays (sda) with a special sorting criterion
std::set<std::array<double, N>, compare_arrays> sda;
// create a vector of double arrays (vda)
std::vector<std::array<double, N>> vda;
// record start time
auto start = std::chrono::steady_clock::now();
// Generate and insert one hundred thousands new double arrays
for (size_t i = 0; i < 100000; ++i) {
  // Get a new random double array (da)
  std::array<double, N> da = getRandomArray();
  // Inserts into the container, if the container doesn't already contain it.
  sda.insert(da);
}
// record finish time
auto end = std::chrono::steady_clock::now();
std::chrono::duration<double> diff = end - start;
std::cout << "Time to generate and insert unique elements into SET: "
          << diff.count() << " s\n";
std::cout << "set size = " << sda.size() << std::endl;

where the sorting criterion is based on flawed (breaks strict weak ordering) answer. At the moment I wanted to look (approximately) what can I expect from different containers and later decide which one is the best.

// return whether the elements in the arr1 are “lexicographically less than”
// the elements in the arr2
struct compare_arrays {
  bool operator() (const std::array<double, N>& arr1,
                   const std::array<double, N>& arr2) const {
    // Lexicographical comparison compares using element-by-element rule
    return std::lexicographical_compare(arr1.begin(), arr1.end(),  // 1st range
                                        arr2.begin(), arr2.end(),  // 2nd range
                                        compare_doubles);   // sorting criteria
   }
  // return true if x < y and not within tolerance distance
  static bool compare_doubles(double x, double y) {
    return (x < y) && !(fabs(x-y) < tolerance);
  }
 private:
  static constexpr double tolerance = 1e-6;  // Comparison tolerance
};

3.2.1 Testing set's performance

In the imaginable table below, I summarise execution time and the size of container depending on different tolerance (eps) values. The same eps values were used, but for the sets the equivalence definition is different.

| eps | time(s) with -O3 flag/without optimization flags | size |

| 1e-6 | 0.041414 / 1.51723 | 100000 |

| 1e-3 | 0.0457692 / 0.136944 | 99988 |

| 1e-2 | 0.0501 / 0.13808 | 90828 |

| 1e-1 | 0.0149597 / 0.0777621 | 2007 |

Performance difference comparing with vector approach is massive. The main concern now is flawed sorting criterion.

Edit: -O3 optimization also makes a good job improving set's performance.

3.3 Using unsorted set

Finally, I was eager to try unordered set, as my expectations after reading a bit of Josuttis, The C++ Standard Library: A Tutorial and Reference

As long as you only insert, erase, and find elements with a specific value, unordered containers provide the best running-time behavior because all these operations have amortized constant complexity.

were really high but cautious, as

Providing a good hash function is trickier than it sounds.

Benchmarking part of implementation using unordered_set container:

  // create a unordered set of double arrays (usda)
  std::unordered_set<std::array<double, N>, ArrayHash, ArrayEqual> usda;
  // record start time
  auto start = std::chrono::steady_clock::now();
  // Generate and insert one hundred thousands new double arrays
  for (size_t i = 0; i < 100000; ++i) {
    // Get a new random double array (da)
    std::array<double, N> da = getRandomArray();
    usda.insert(da);
  }
  // record finish time
  auto end = std::chrono::steady_clock::now();
  std::chrono::duration<double> diff = end - start;
  std::cout << "Time to generate and insert unique elements into UNORD. SET: "
            << diff.count() << " s\n";
  std::cout << "unord. set size() = " << usda.size() << std::endl;

where a naive hash function:

// Hash Function
struct ArrayHash {
  std::size_t operator() (const std::array<double, N>& arr) const {
    std::size_t ret;
    for (const double elem : arr) {
      ret += std::hash<double>()(elem);
    }
    return ret;
  }
};

and equivalence criterion:

// Equivalence Criterion
struct ArrayEqual {
  bool operator() (const std::array<double, N>& arr1,
                          const std::array<double, N>& arr2) const {
    return EuclideanDistance2(arr1.begin(), arr1.end(), arr2.begin()) < tol*tol;
  }
 private:
  static constexpr double tol = 1e-6;  // Comparison tolerance
};

3.3.1 Testing unsorted set's performance

In the last messy table below, I again summarise the execution time and the size of container depending on different tolerance (eps) values.

| eps | time(s) with -O3 flag/without optimization flags | size |

| 1e-6 | 57.4823 / 0.0590703 | 100000 / 100000 |

| 1e-3 | 57.9588 / 0.0618149 | 99978 / 100000 |

| 1e-2 | 43.2816 / 0.0595529 | 82873 / 100000 |

| 1e-1 | 0.238788 / 0.0578297 | 781 / 99759 |

Shortly, execution times are the best comparing with other two approaches, however even using quite loose tolerance (1e-1) almost all random vectors were identified as unique. So, in my case saving time for lookup but wasting much more time doing other expensive steps of my problem. I guess, this is because my hash function is really naive?

Edit: This is the most unexpected behaviour. Turning on -O3 optimization for the unordered set, decreased performance awfully. Even more suprising, the number of unique elements depends on the optimization flag, which is shouldn't be Probably this only means, that I must provide a better hash function!?

4. Open questions

As I know in advances the maximum size of possible unique vectors would it make sense to use std::vector::reserve(100000)?

According to The C++ Programming Language by Bjarne Stroustrup reserve does not make big impact into the performance:

I used to be careful about using reserve() when I was reading into a vector. I was surprised to find that for essentially all of my uses, calling reserve() did not measurably affect performance. The default growth strategy worked just as well as my estimates, so I stopped trying to improve performance using reserve().

I have repeated the same experiment using vectors and eps = 1e-6 with reserve() = 100000 and in this case got that the total execution time was 111.461 (s) compared to 111.83 (s) without reserve(). So, the difference is negligible.

How to provide better hash function for the situation described in
General comments about the fairness of this comparison. How can I improve it?
Any general comments how to make my code better and more efficient are very welcome - I love to learn from you, guys! ;)

P.S. Finally, is there (in StackOverflow) proper markdown support to create tables? In the final version of this question (benchmarking) I would like to put the final summarizing table.

P.P.S. Please feel free to correct my poor English.

回答1:

for hash function it may be better to use ^= instead of += to make hash more random.

for comparison you may combine ArrayEqual with EuclideanDistance2:

struct ArrayEqual {
  bool operator() (const std::array<double, N>& arr1,
              const std::array<double, N>& arr2) const {
      auto beg1 = arr1.begin(), end1 = arr1.end(),  beg2 = arr2.begin();
      double val = 0.0;
      while (beg1 != end1) {
        double dist = (*beg1++) - (*beg2++);
        val += dist*dist;
        if (val >= tol*tol)
            return false;
      }
      return true;
  }
 private:
  static constexpr double tol = 1e-6;  // Comparison tolerance
};

来源：https://stackoverflow.com/questions/38687879/appropriate-container-for-the-fast-insertion-and-lookup-of-n-dimensional-real-ve

标签

c++

performance

stl

performance-testing

Appropriate container for the fast insertion and lookup of n-dimensional real vectors (initial benchmarking provided)

问题

1. Description of the problem

2. Choosing a right container

2.1 Complexity (based on cppreference.com)

3. Benchmarking different containers

3.1 using unsorted vector

3.1.1 Testing vector's performance

3.2 Using set

3.2.1 Testing set's performance

3.3 Using unsorted set

3.3.1 Testing unsorted set's performance

4. Open questions

回答1:

3.1 using unsorted `vector`

3.2 Using `set`