问题
In ruby, what is the most efficient way to calculate the bit difference between two unsigned integers (e.g. the hamming distance)?
Eg, I have integer a = 2323409845 and b = 1782647144.
Their binary representations are:
a = 10001010011111000110101110110101
b = 01101010010000010000100101101000
The bit difference between the a & b is 17..
I can do a logical XOR on them, but that will give me a different integer != 17, I would then have to iterate through the binary representation of the result and tally the # of 1s.
What is the most efficient way to calculate the bit difference?
Now, does the answer change for calculating the bit difference of sequences of many ints? E.g. given 2 sequences of unsigned integers:
x = {2323409845,641760420,509499086....}
y = {uint,uint,uint...}
What is the most efficient way to calculate the bit difference between the two sequences?
Would you iterate through the sequence, or is there a faster way to calculate the difference across the entire sequence at once?
回答1:
You can make use of the optimized String functions in Ruby to do the bit counting, instead of pure arithmetic. It turns out to be about 6 times faster with some quick benchmarking.
def h2(a, b)
(a^b).to_s(2).count("1")
end
h1 is the normal way to calculate, while h2 converts the xor into a string, and counts the number of "1"s
Benchmark:
ruby-1.9.2-p180:001:0>> def h1(a, b)
ruby-1.9.2-p180:002:1*> ret = 0
ruby-1.9.2-p180:003:1*> xor = a ^ b
ruby-1.9.2-p180:004:1*> until xor == 0
ruby-1.9.2-p180:005:2*> ret += 1
ruby-1.9.2-p180:006:2*> xor &= xor - 1
ruby-1.9.2-p180:007:2*> end
ruby-1.9.2-p180:008:1*> ret
ruby-1.9.2-p180:009:1*> end
# => nil
ruby-1.9.2-p180:010:0>> def h2(a, b)
ruby-1.9.2-p180:011:1*> (a^b).to_s(2).count("1")
ruby-1.9.2-p180:012:1*> end
# => nil
ruby-1.9.2-p180:013:0>> h1(2323409845, 1782647144)
# => 17
ruby-1.9.2-p180:014:0>> h2(2323409845, 1782647144)
# => 17
ruby-1.9.2-p180:015:0>> quickbench(10**5) { h1(2323409845, 1782647144) }
Rehearsal ------------------------------------
2.060000 0.000000 2.060000 ( 1.944690)
--------------------------- total: 2.060000sec
user system total real
1.990000 0.000000 1.990000 ( 1.958056)
# => nil
ruby-1.9.2-p180:016:0>> quickbench(10**5) { h2(2323409845, 1782647144) }
Rehearsal ------------------------------------
0.340000 0.000000 0.340000 ( 0.333673)
--------------------------- total: 0.340000sec
user system total real
0.320000 0.000000 0.320000 ( 0.326854)
# => nil
ruby-1.9.2-p180:017:0>>
回答2:
Per the suggestion of mu is too short, I wrote a simple C extension to use __builtin_popcount , and using benchmark verified that it is at least 3X faster than ruby's optimized string functions..
I looked at the following two tutorials:
- Extending Ruby With C
- Ruby Extension in C in 5 min
In my program:
require './FastPopcount/fastpopcount.so'
include FastPopcount
def hamming(a,b)
popcount(a^b)
end
Then in the dir containing my program, I create a folder "PopCount" with the following files.
extconf.rb:
# Loads mkmf which is used to make makefiles for Ruby extensions
require 'mkmf'
# Give it a name
extension_name = 'fastpopcount'
# The destination
dir_config(extension_name)
# Do the work
create_makefile(extension_name)
popcount.c:
// Include the Ruby headers and goodies
#include "ruby.h"
// Defining a space for information and references about the module to be stored internally
VALUE FastPopcount = Qnil;
// Prototype for the initialization method - Ruby calls this, not you
void Init_fastpopcount();
// Prototype for our method 'popcount' - methods are prefixed by 'method_' here
VALUE method_popcount(int argc, VALUE *argv, VALUE self);
// The initialization method for this module
void Init_fastpopcount() {
FastPopcount = rb_define_module("FastPopcount");
rb_define_method(FastPopcount, "popcount", method_popcount, 1);
}
// Our 'popcount' method.. it uses the builtin popcount
VALUE method_popcount(int argc, VALUE *argv, VALUE self) {
return INT2NUM(__builtin_popcount(NUM2UINT(argv)));
}
Then in the popcount directory run:
ruby extconf.rb make
Then run the program, and there you have it....fastest way to do hamming distance in ruby.
回答3:
An algorithm of Wegner:
def hamm_dist(a, b)
dist = 0
val = a ^ b
while not val.zero?
dist += 1
val &= val - 1
end
dist
end
p hamm_dist(2323409845, 1782647144) # => 17
回答4:
If one intends to follow c-based path, it is a good idea to add the compiler flag -msse4.2
to your makefile. This allows the compiler to generate hardware based popcnt
instructions instead of using a table to generate the popcount. On my system this was approximately 2.5x faster.
来源:https://stackoverflow.com/questions/6395165/most-efficient-way-to-calculate-hamming-distance-in-ruby