I have a table A which has a column 'template_phash'. I store the phash generated from 400K images.
Now I take a random image and generate a phash from that image.
Now how do I query so that I can get the record from table A which hamming distance difference is less than a threshold value, say 20.
I have seen Hamming distance on binary strings in SQL, but couldn't figure it out.
I think I figured out that I need to make a function to achieve this but how?
Both of my phash are in BigInt eg: 7641692061273169067
Please help me make the function so that I could query like
SELECT product_id, HAMMING_DISTANCE(phash1, phash2) as hd
WHERE hd < 20 ORDER BY hd ASC;
I figured out that the hamming distance is just the count of different bits between the two hashes. First xor the two hashes then get the count of binary ones:
SELECT product_id, BIT_COUNT(phash1 ^ phash2) as hd from A ORDER BY hd ASC;