How to store binary data when you only care about speed?

前端未结

关注

 3  1527

I have N points in D dimensions, where let\'s say N is 1 million and D 1 hundred. All my points have binary coordinates, i.e. {0, 1}^D, and I am only interested in speed

相关标签:

3条回答

花落未央

2020-12-21 23:28

I wrote a simple program to populate and contiguously access a data structure with binary data:

std::vector<int>

std::vector<char>

std::vector<bool>

std::bitset

I used my Time measurements. I used -O3 optimization flag, N = 1 mil and D = 100.

This is the code for vectors:

#include <vector> #include <iostream> #include <random> #include <cmath> #include <numeric> #include <functional> //plus, equal_to, not2 #include <ctime> #include <ratio> #include <chrono> #define T int unsigned int hd(const std::vector<T>& s1, const std::vector<T>::iterator s2) { return std::inner_product( s1.begin(), s1.end(), s2, 0, std::plus<unsigned int>(), std::not2(std::equal_to<std::vector<T>::value_type>()) ); } std::uniform_int_distribution<int> uni_bit_distribution(0, 1); std::default_random_engine generator(std::chrono::system_clock::now().time_since_epoch().count()); // g++ -Wall -O3 bitint.cpp -o bitint int main() { const int N = 1000000; const int D = 100; unsigned int hamming_dist[N] = {0}; unsigned int ham_d[N] = {0}; std::vector<T> q; for(int i = 0; i < D; ++i) q.push_back(uni_bit_distribution(generator)); using namespace std::chrono; high_resolution_clock::time_point t1 = high_resolution_clock::now(); std::vector<T> v; v.resize(N * D); for(int i = 0; i < N; ++i) for(int j = 0; j < D; ++j) v[j + i * D] = uni_bit_distribution(generator); high_resolution_clock::time_point t2 = high_resolution_clock::now(); duration<double> time_span = duration_cast<duration<double> >(t2 - t1); std::cout << "Build " << time_span.count() << " seconds.\n"; t1 = high_resolution_clock::now(); for(int i = 0; i < N; ++i) for(int j = 0; j < D; ++j) hamming_dist[i] += (v[j + i * D] != q[j]); t2 = high_resolution_clock::now(); time_span = duration_cast<duration<double> >(t2 - t1); std::cout << "No function hamming distance " << time_span.count() << " seconds.\n"; t1 = high_resolution_clock::now(); for(int i = 0; i < N; ++i) ham_d[i] = hd(q, v.begin() + (i * D)); t2 = high_resolution_clock::now(); time_span = duration_cast<duration<double> >(t2 - t1); std::cout << "Yes function hamming distance " << time_span.count() << " seconds.\n"; return 0; }

The code for std::bitset can be found in: XOR bitset when 2D bitset is stored as 1D

For std::vector<int> I got:

Build 3.80404 seconds. No function hamming distance 0.0322335 seconds. Yes function hamming distance 0.0352869 seconds.

For std::vector<char> I got:

Build 8.2e-07 seconds. No function hamming distance 8.4e-08 seconds. Yes function hamming distance 2.01e-07 seconds.

For std::vector<bool> I got:

Build 4.34496 seconds. No function hamming distance 0.162005 seconds. Yes function hamming distance 0.258315 seconds.

For std:bitset I got:

Build 4.28947 seconds. Hamming distance 0.00385685 seconds.

std::vector<char> seems to be the winner.

0 讨论(0)

发布评论:

提交评论

加载中...

礼貌的吻别

2020-12-21 23:30

Locality of reference will likely be the driving force. So it's fairly obvious that you represent the D coordinates of a single point as a contiguous bitvector. std::bitset<D> would be a logical choice.

However, the next important thing to realize is that you see locality benefits easily up to 4KB. This means that you should not pick a single point and compare it against all other N-1 points. Instead, group points in sets of 4KB each, and compare those groups. Both ways are O(N*N), but the second will be much faster.

You may be able to beat O(N*N) by use of the triangle inequality - Hamming(a,b)+Hamming(b,c) >= Hamming (a,c). I'm just wondering how. It probably depends on how you want your output. The naive output would be a N*N set of distances, and that's unavoidably O(N*N).

0 讨论(0)

发布评论:

提交评论

加载中...

清酒与你

2020-12-21 23:36

If the values are independently, uniformly distributed, and you want to find the Hamming distance between two independently, randomly chosen points, the most efficient layout is a packed array of bits.

This packed array would ideally be chunked into the largest block size over which your popcnt instruction works: 64 bits. The hamming distance is the sum of popcnt(x_blocks[i] ^ y_blocks[i]). On processors with efficient unaligned accesses, byte alignment with unaligned reads is likely to be most efficient. On processors where unaligned reads incur a penalty, one should consider whether the memory overhead of aligned rows is worth faster logic.

0 讨论(0)

发布评论:

提交评论

加载中...

验证码

看不清?

提交回复