Query points on the vertices of a Hamming cube

后端 未结 1 1506
悲哀的现实
悲哀的现实 2021-01-17 03:44

I have N points that lie only on the vertices of a cube, of dimension D, where D is something like 3.

A vertex may not contain any point. So every point has coordina

相关标签:
1条回答
  • 2021-01-17 04:32

    There is a nice bithack to go from one bitmask with k bits set to the lexicographically next permutation, which means it's fairly simple to loop through all masks with k bits set. XORing these masks with an initial value gives all the values at hamming distance exactly k away from it.

    So for D dimensions, where D is less than 32 (otherwise change the types),

    uint32_t limit = (1u << D) - 1;
    for (int k = 1; k <= r; k++) {
        uint32_t diff = (1u << k) - 1;
        while (diff <= limit) {
            // v is the input vertex
            uint32_t vertex = v ^ diff;
            // use it
            diff = nextBitPermutation(diff);
        }
    }
    

    Where nextBitPermutation may be implemented in C++ as something like (if you have __builtin_ctz)

    uint32_t nextBitPermutation(uint32_t v) {
        // see https://graphics.stanford.edu/~seander/bithacks.html#NextBitPermutation
        uint32_t t = v | (v - 1);
        return (t + 1) | (((~t & -~t) - 1) >> (__builtin_ctz(v) + 1));
    }
    

    Or for MSVC (not tested)

    uint32_t nextBitPermutation(uint32_t v) {
        // see https://graphics.stanford.edu/~seander/bithacks.html#NextBitPermutation
        uint32_t t = v | (v - 1);
        unsigned long tzc;
        _BitScanForward(&tzc, v); // v != 0 so the return value doesn't matter
        return (t + 1) | (((~t & -~t) - 1) >> (tzc + 1));
    }
    

    If D is really low, 4 or lower, the old popcnt-with-pshufb works really well and generally everything just lines up well, like this:

    uint16_t query(int vertex, int r, int8_t* validmask)
    {
        // validmask should be array of 16 int8_t's,
        // 0 for a vertex that doesn't exist, -1 if it does
        __m128i valid = _mm_loadu_si128((__m128i*)validmask);
        __m128i t0 = _mm_set1_epi8(vertex);
        __m128i r0 = _mm_set1_epi8(r + 1);
        __m128i all =        _mm_setr_epi8(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15);
        __m128i popcnt_lut = _mm_setr_epi8(0, 1, 1, 2, 1, 2, 2, 3, 1, 2,  2,  3,  2,  3,  3,  4);
        __m128i dist = _mm_shuffle_epi8(popcnt_lut, _mm_xor_si128(t0, all));
        __m128i close_enough = _mm_cmpgt_epi8(r0, dist);
        __m128i result = _mm_and_si128(close_enough, valid);
        return _mm_movemask_epi8(result);
    }
    

    This should be fairly fast; fast compared to the bithack above (nextBitPermutation, which is fairly heavy, is used a lot there) and also compared to looping over all vertices and testing whether they are in range (even with builtin popcnt, that automatically takes at least 16 cycles and the above shouldn't, assuming everything is cached or even permanently in a register). The downside is the result is annoying to work with, since it's a mask of which vertices both exist and are in range of the queried point, not a list of them. It would combine well with doing some processing on data associated with the points though.

    This also scales down to D=3 of course, just make none of the points >= 8 valid. D>4 can be done similarly but it takes more code then, and since this is really a brute force solution that is only fast due to parallelism it fundamentally gets slower exponentially in D.

    0 讨论(0)
提交回复
热议问题