hamming-distance

Most efficient way to calculate hamming distance in ruby?

↘锁芯ラ 提交于 2019-11-29 07:21:17
问题 In ruby, what is the most efficient way to calculate the bit difference between two unsigned integers (e.g. the hamming distance)? Eg, I have integer a = 2323409845 and b = 1782647144. Their binary representations are: a = 10001010011111000110101110110101 b = 01101010010000010000100101101000 The bit difference between the a & b is 17.. I can do a logical XOR on them, but that will give me a different integer != 17, I would then have to iterate through the binary representation of the result

mysql hamming distance between two phash

情到浓时终转凉″ 提交于 2019-11-29 02:56:45
问题 I have a table A which has a column 'template_phash'. I store the phash generated from 400K images. Now I take a random image and generate a phash from that image. Now how do I query so that I can get the record from table A which hamming distance difference is less than a threshold value, say 20. I have seen Hamming distance on binary strings in SQL, but couldn't figure it out. I think I figured out that I need to make a function to achieve this but how? Both of my phash are in BigInt eg:

Hamming Distance / Similarity searches in a database

╄→гoц情女王★ 提交于 2019-11-28 09:30:17
I have a process, similar to tineye that generates perceptual hashes, these are 32bit ints. I intend to store these in a sql database (maybe a nosql db) in the future However, I'm stumped at how I would be able to retrieve records based on the similarity of hashes. Any Ideas? A common approach (at least common to me) is to divide your hash bit string in several chunks and query on these chunks for an exact match. This is a "pre-filter" step. You then can perform a bitwise hamming distance computation on the returned results which should be only a smaller subset of your overall dataset. This

Similar image search by pHash distance in Elasticsearch

寵の児 提交于 2019-11-28 03:15:59
Similar image search problem Millions of images pHash 'ed and stored in Elasticsearch. Format is "11001101...11" (length 64), but can be changed (better not). Given subject image's hash "100111..10" we want to find all similar image hashes in Elasticsearch index within hamming distance of 8 . Of course, query can return images with greater distance than 8 and script in Elasticsearch or outside can filter the result set. But total search time must be within 1 second or so. Our current mapping Each document has nested images field that contains image hashes: { "images": { "type": "nested",

Fast computation of pairs with least hamming distance

假装没事ソ 提交于 2019-11-28 01:36:26
问题 Problem Suppose you have N (~100k-1m) integers/bitstrings each K (e.g. 256) bits long. The algorithm should return the k pairs with the lowest pairwise Hamming distance. Example N = 4 K = 8 i1 = 00010011 i2 = 01010101 i3 = 11000000 i4 = 11000011 HammingDistance(i1,i2) = 3 HammingDistance(i1,i3) = 5 HammingDistance(i1,i4) = 3 HammingDistance(i2,i3) = 4 HammingDistance(i2,i4) = 4 HammingDistance(i3,i4) = 2 For k=1 it should return the pairlist {(i3,i4)}. For k=3 it should return {(i1,i2), (i1

Generate all sequences of bits within Hamming distance t

不问归期 提交于 2019-11-27 16:20:25
Given a vector of bits v , compute the collection of bits that have Hamming distance 1 with v , then with distance 2, up to an input parameter t . So for 011 I should get ~~~ 111 001 010 ~~~ -> 3 choose 1 in number 101 000 110 ~~~ -> 3 choose 2 100 ~~~ -> 3 choose 3 How to efficiently compute this? The vector won't be always of dimension 3, e.g. it could be 6. This will run numerous time in my real code, so some efficiency would be welcome as well (even by paying more memory). My attempt: #include <iostream> #include <vector> void print(const std::vector<char>& v, const int idx, const char new

Shortest path to transform one word into another

大城市里の小女人 提交于 2019-11-27 06:38:33
For a Data Structures project, I must find the shortest path between two words (like "cat" and "dog" ), changing only one letter at a time. We are given a Scrabble word list to use in finding our path. For example: cat -> bat -> bet -> bot -> bog -> dog I've solved the problem using a breadth first search, but am seeking something better (I represented the dictionary with a trie). Please give me some ideas for a more efficient method (in terms of speed and memory). Something ridiculous and/or challenging is preferred. I asked one of my friends (he's a junior) and he said that there is no

Hamming distance on binary strings in SQL

三世轮回 提交于 2019-11-27 03:41:25
I have a table in my DB where I store SHA256 hashes in a BINARY(32) column. I'm looking for a way to compute the Hamming distance of the entries in the column to a supplied value, i.e. something like: SELECT * FROM table ORDER BY HAMMINGDISTANCE(hash, UNHEX(<insert supplied sha256 hash here>)) ASC LIMIT 10 (in case you're wondering, the Hamming distance of strings A and B is defined as BIT_COUNT(A^B) , where ^ is the bitwise XOR operator and BIT_COUNT returns the number of 1s in the binary string). Now, I know that both the ^ operator and BIT_COUNT function only work on INTEGERs and so I'd say

Hamming Distance / Similarity searches in a database

放肆的年华 提交于 2019-11-27 01:48:08
问题 I have a process, similar to tineye that generates perceptual hashes, these are 32bit ints. I intend to store these in a sql database (maybe a nosql db) in the future However, I'm stumped at how I would be able to retrieve records based on the similarity of hashes. Any Ideas? 回答1: A common approach (at least common to me) is to divide your hash bit string in several chunks and query on these chunks for an exact match. This is a "pre-filter" step. You then can perform a bitwise hamming

Similar image search by pHash distance in Elasticsearch

徘徊边缘 提交于 2019-11-26 23:57:38
问题 Similar image search problem Millions of images pHash'ed and stored in Elasticsearch. Format is "11001101...11" (length 64), but can be changed (better not). Given subject image's hash "100111..10" we want to find all similar image hashes in Elasticsearch index within hamming distance of 8 . Of course, query can return images with greater distance than 8 and script in Elasticsearch or outside can filter the result set. But total search time must be within 1 second or so. Our current mapping