How to store binary data when you only care about speed?

前端未结

关注

 3  1526

我寻月下人不归 2020-12-21 22:58

I have N points in D dimensions, where let\'s say N is 1 million and D 1 hundred. All my points have binary coordinates, i.e. {0, 1}^D, and I am only interested in speed

3条回答

礼貌的吻别 (楼主)

2020-12-21 23:30

Locality of reference will likely be the driving force. So it's fairly obvious that you represent the D coordinates of a single point as a contiguous bitvector. std::bitset would be a logical choice.

However, the next important thing to realize is that you see locality benefits easily up to 4KB. This means that you should not pick a single point and compare it against all other N-1 points. Instead, group points in sets of 4KB each, and compare those groups. Both ways are O(N*N), but the second will be much faster.

You may be able to beat O(N*N) by use of the triangle inequality - Hamming(a,b)+Hamming(b,c) >= Hamming (a,c). I'm just wondering how. It probably depends on how you want your output. The naive output would be a N*N set of distances, and that's unavoidably O(N*N).

0 讨论(0)

查看其它3个回答

发布评论:

提交评论

加载中...

验证码

看不清?

提交回复