hamming-distance

XOR bitset when 2D bitset is stored as 1D

隐身守侯 提交于 2019-12-01 11:10:43
问题 To answer How to store binary data when you only care about speed?, I am trying to write some to do comparisons, so I want to use std::bitset . However, for fair comparison, I would like a 1D std::bitset to emulate a 2D. So instead of having: bitset<3> b1(string("010")); bitset<3> b2(string("111")); I would like to use: bitset<2 * 3> b1(string("010111")); to optimize data locality. However, now I am having problem with How should I store and compute Hamming distance between binary codes?, as

Finding Minimum hamming distance of a set of strings in python

一个人想着一个人 提交于 2019-12-01 04:58:56
I have a set of n (~1000000) strings (DNA sequences) stored in a list trans. I have to find the minimum hamming distance of all sequences in the list. I implemented a naive brute force algorithm, which has been running for more than a day and has not yet given a solution. My code is dmin=len(trans[0]) for i in xrange(len(trans)): for j in xrange(i+1,len(trans)): dist=hamdist(trans[i][:-1], trans[j][:-1]) if dist < dmin: dmin = dist Is there a more efficient method to do this? Here hamdist is a function I wrote to find hamming distances. It is def hamdist(str1, str2): diffs = 0 if len(str1) !=

Finding Minimum hamming distance of a set of strings in python

丶灬走出姿态 提交于 2019-12-01 02:25:22
问题 I have a set of n (~1000000) strings (DNA sequences) stored in a list trans. I have to find the minimum hamming distance of all sequences in the list. I implemented a naive brute force algorithm, which has been running for more than a day and has not yet given a solution. My code is dmin=len(trans[0]) for i in xrange(len(trans)): for j in xrange(i+1,len(trans)): dist=hamdist(trans[i][:-1], trans[j][:-1]) if dist < dmin: dmin = dist Is there a more efficient method to do this? Here hamdist is

Computing pairwise Hamming distance between all rows of two integer matrices/data frames

两盒软妹~` 提交于 2019-12-01 00:42:35
I have two data frames, df1 with reference data and df2 with new data. For each row in df2 , I need to find the best (and the second best) matching row to df1 in terms of hamming distance. I used e1071 package to compute hamming distance. Hamming distance between two vectors x and y can be computed as for example: x <- c(356739, 324074, 904133, 1025460, 433677, 110525, 576942, 526518, 299386, 92497, 977385, 27563, 429551, 307757, 267970, 181157, 3796, 679012, 711274, 24197, 610187, 402471, 157122, 866381, 582868, 878) y <- c(356739, 324042, 904133, 959893, 433677, 110269, 576942, 2230, 267130,

Computing pairwise Hamming distance between all rows of two integer matrices/data frames

拈花ヽ惹草 提交于 2019-11-30 19:31:40
问题 I have two data frames, df1 with reference data and df2 with new data. For each row in df2 , I need to find the best (and the second best) matching row to df1 in terms of hamming distance. I used e1071 package to compute hamming distance. Hamming distance between two vectors x and y can be computed as for example: x <- c(356739, 324074, 904133, 1025460, 433677, 110525, 576942, 526518, 299386, 92497, 977385, 27563, 429551, 307757, 267970, 181157, 3796, 679012, 711274, 24197, 610187, 402471,

Fast Hamming distance scoring

可紊 提交于 2019-11-30 08:43:27
There is a database with N fixed length strings. There is a query string of the same length. The problem is to fetch first k strings from the database that have the smallest Hamming distance to q. N is small (about 400), strings are long, fixed in length. Database doesn't change, so we can pre-compute indexes. Queries vary strongly, caching and/or pre-computation is not an option. There are lots of them per second. We need always k results, even if k-1 results have match 0 (sorting on Hamming distance and take first k, so locality sensitive hashing and similar approaches won't do). kd-tree and

mysql hamming distance between two phash

家住魔仙堡 提交于 2019-11-30 05:10:28
I have a table A which has a column 'template_phash'. I store the phash generated from 400K images. Now I take a random image and generate a phash from that image. Now how do I query so that I can get the record from table A which hamming distance difference is less than a threshold value, say 20. I have seen Hamming distance on binary strings in SQL , but couldn't figure it out. I think I figured out that I need to make a function to achieve this but how? Both of my phash are in BigInt eg: 7641692061273169067 Please help me make the function so that I could query like SELECT product_id,

Fastest way to get hamming distance for integer array

拟墨画扇 提交于 2019-11-30 04:51:00
问题 Let a and b be vectors of the same size with 8-bit integers (0-255). I want to compute the number of bits where those vectors differs i.e. a Hamming distance between vectors formed by concatenation of binary representations of those numbers. For example: a = [127,255] b= [127,240] Using numpy library np.bitwise_xor(a,b) # Output: array([ 0, 15]) What I need is now to binary represent each element of the above array and count number of 1s in all the elements of the array. The above example

Fast Hamming distance scoring

喜欢而已 提交于 2019-11-29 13:23:06
问题 There is a database with N fixed length strings. There is a query string of the same length. The problem is to fetch first k strings from the database that have the smallest Hamming distance to q. N is small (about 400), strings are long, fixed in length. Database doesn't change, so we can pre-compute indexes. Queries vary strongly, caching and/or pre-computation is not an option. There are lots of them per second. We need always k results, even if k-1 results have match 0 (sorting on Hamming

Fast computation of pairs with least hamming distance

喜夏-厌秋 提交于 2019-11-29 07:59:09
Problem Suppose you have N (~100k-1m) integers/bitstrings each K (e.g. 256) bits long. The algorithm should return the k pairs with the lowest pairwise Hamming distance. Example N = 4 K = 8 i1 = 00010011 i2 = 01010101 i3 = 11000000 i4 = 11000011 HammingDistance(i1,i2) = 3 HammingDistance(i1,i3) = 5 HammingDistance(i1,i4) = 3 HammingDistance(i2,i3) = 4 HammingDistance(i2,i4) = 4 HammingDistance(i3,i4) = 2 For k=1 it should return the pairlist {(i3,i4)}. For k=3 it should return {(i1,i2), (i1,i4), (i3,i4)}. And so on. Algorithm The naive implementation computes all pairwise distances, sorts the