How to store binary data when you only care about speed?

前端 未结 3 1527
我寻月下人不归
我寻月下人不归 2020-12-21 22:58

I have N points in D dimensions, where let\'s say N is 1 million and D 1 hundred. All my points have binary coordinates, i.e. {0, 1}^D, and I am only interested in speed

相关标签:
3条回答
  • 2020-12-21 23:28

    I wrote a simple program to populate and contiguously access a data structure with binary data:

    1. std::vector<int>
    2. std::vector<char>
    3. std::vector<bool>
    4. std::bitset

    I used my Time measurements. I used -O3 optimization flag, N = 1 mil and D = 100.

    This is the code for vectors:

    #include <vector>
    #include <iostream>
    #include <random>
    #include <cmath>
    #include <numeric>
    #include <functional> //plus, equal_to, not2
    
    #include <ctime>
    #include <ratio>
    #include <chrono>
    
    #define T int
    
    unsigned int hd(const std::vector<T>& s1, const std::vector<T>::iterator s2)
    {
        return std::inner_product(
            s1.begin(), s1.end(), s2, 
            0, std::plus<unsigned int>(),
            std::not2(std::equal_to<std::vector<T>::value_type>())
        );
    }
    
    
    std::uniform_int_distribution<int> uni_bit_distribution(0, 1);
    std::default_random_engine generator(std::chrono::system_clock::now().time_since_epoch().count());
    
    // g++ -Wall -O3 bitint.cpp -o bitint
    int main()
    {
        const int N = 1000000;
        const int D = 100;
        unsigned int hamming_dist[N] = {0};
        unsigned int ham_d[N] = {0};
    
        std::vector<T> q;
        for(int i = 0; i < D; ++i)
            q.push_back(uni_bit_distribution(generator));
    
        using namespace std::chrono;
        high_resolution_clock::time_point t1 = high_resolution_clock::now();
    
    
        std::vector<T> v;
        v.resize(N * D);
        for(int i = 0; i < N; ++i)
            for(int j = 0; j < D; ++j)
                v[j + i * D] = uni_bit_distribution(generator);
    
    
        high_resolution_clock::time_point t2 = high_resolution_clock::now();
    
        duration<double> time_span = duration_cast<duration<double> >(t2 - t1);
    
        std::cout << "Build " << time_span.count() << " seconds.\n";
    
        t1 = high_resolution_clock::now();
    
        for(int i = 0; i < N; ++i)
            for(int j = 0; j < D; ++j)
            hamming_dist[i] += (v[j + i * D] != q[j]);
    
        t2 = high_resolution_clock::now();
        time_span = duration_cast<duration<double> >(t2 - t1);
        std::cout << "No function hamming distance " << time_span.count() << " seconds.\n";
    
        t1 = high_resolution_clock::now();
    
        for(int i = 0; i < N; ++i)
            ham_d[i] = hd(q, v.begin() + (i * D));
    
        t2 = high_resolution_clock::now();
        time_span = duration_cast<duration<double> >(t2 - t1);
        std::cout << "Yes function hamming distance " << time_span.count() << " seconds.\n";
    
        return 0;
    }
    

    The code for std::bitset can be found in: XOR bitset when 2D bitset is stored as 1D

    For std::vector<int> I got:

    Build 3.80404 seconds.
    No function hamming distance 0.0322335 seconds.
    Yes function hamming distance 0.0352869 seconds.
    

    For std::vector<char> I got:

    Build 8.2e-07 seconds.
    No function hamming distance 8.4e-08 seconds.
    Yes function hamming distance 2.01e-07 seconds.
    

    For std::vector<bool> I got:

    Build 4.34496 seconds.
    No function hamming distance 0.162005 seconds.
    Yes function hamming distance 0.258315 seconds.
    

    For std:bitset I got:

    Build 4.28947 seconds.
    Hamming distance 0.00385685 seconds.
    

    std::vector<char> seems to be the winner.

    0 讨论(0)
  • 2020-12-21 23:30

    Locality of reference will likely be the driving force. So it's fairly obvious that you represent the D coordinates of a single point as a contiguous bitvector. std::bitset<D> would be a logical choice.

    However, the next important thing to realize is that you see locality benefits easily up to 4KB. This means that you should not pick a single point and compare it against all other N-1 points. Instead, group points in sets of 4KB each, and compare those groups. Both ways are O(N*N), but the second will be much faster.

    You may be able to beat O(N*N) by use of the triangle inequality - Hamming(a,b)+Hamming(b,c) >= Hamming (a,c). I'm just wondering how. It probably depends on how you want your output. The naive output would be a N*N set of distances, and that's unavoidably O(N*N).

    0 讨论(0)
  • 2020-12-21 23:36

    If the values are independently, uniformly distributed, and you want to find the Hamming distance between two independently, randomly chosen points, the most efficient layout is a packed array of bits.

    This packed array would ideally be chunked into the largest block size over which your popcnt instruction works: 64 bits. The hamming distance is the sum of popcnt(x_blocks[i] ^ y_blocks[i]). On processors with efficient unaligned accesses, byte alignment with unaligned reads is likely to be most efficient. On processors where unaligned reads incur a penalty, one should consider whether the memory overhead of aligned rows is worth faster logic.

    0 讨论(0)
提交回复
热议问题