问题
I've been working through Project Euler and Sphere Online Judge problems. In this particular problem, I have to find all the prime numbers within two given numbers. I have a function that looks promising (based on the Sieve of Eratosthenes), except it's too slow. Can someone spot what is slowing my function down so much, and hint at how I can fix it? Also, some comments about how to approach optimization in general (or links to such comments/books/articles etc,) would be greatly appreciated.
Code:
def ranged_sieve(l, b)
primes = (l..b).to_a
primes[0]=nil if primes[0] < 2
(2..Math.sqrt(b).to_i).each do |counter|
step_from = l / counter
step_from = step_from * counter
l > 3 ? j = step_from : j = counter + counter
(j..b).step(counter) do |stepped|
index = primes.index(stepped)
primes[index] = nil if index
end
end
primes.compact
end
回答1:
I haven't looked fully, but one factor is that, you are replacing a certain value in primes
with nil
, and later compact
-ing it to remove them. This is a waste. Just by doing that directly with delete_at
makes it more than twice fast:
def ranged_sieve2(l, b)
primes = (l..b).to_a
primes.delete_at(0) if primes[0] < 2
(2..Math.sqrt(b).to_i).each do |counter|
step_from = l / counter
step_from = step_from * counter
l > 3 ? j = step_from : j = counter + counter
(j..b).step(counter) do |stepped|
index = primes.index(stepped)
primes.delete_at(index) if index
end
end
primes
end
ranged_sieve(1, 100) # => Took approx 8e-4 seconds on my computer
ranged_sieve2(1, 100) # => Took approx 3e-4 seconds on my computer
Another point to improve is that, using a hash is much faster than array as the relevant size gets larger. Replacing your array implementation with a hash, you can get this:
def ranged_sieve3(l, b)
primes = (l..b).inject({}){|h, i| h[i] = true; h}
primes.delete(0)
primes.delete(1)
(2..Math.sqrt(b).to_i).each do |counter|
step_from = l / counter
step_from = step_from * counter
l > 3 ? j = step_from : j = counter + counter
(j..b).step(counter) do |stepped|
primes.delete(stepped)
end
end
primes.keys
end
When you do range_sieve3(1, 100)
with this, it is slower than range_sieve2(1, 100)
because of the overhead. But as you make the number larger, the superiority becomes salient. For example, I got this result on my computer:
ranged_sieve(1, 1000) # => Took 1e-01 secs
ranged_sieve2(1, 1000) # => Took 3e-02 secs
ranged_sieve3(1, 1000) # => Took 8e-04 secs
回答2:
The PRIME1 problem at SPOJ (Sphere Online Judges) is designed so that you cannot simply sieve up to the upper limit, because in that case you will get hit by the timeout.
One possible approach is superior speed; by adding a few bells and whistles to the standard sieve it can be made to run fast enough to stay well below the timeout limit. Speed optimisations include:
- representing only the odd integers in the sieve (50% space savings)
- sieving in small, cache-friendly segments that fit into the L1 cache (32 KByte)
- presieving by small primes (i.e. blasting a precomputed pattern over the sieve segment)
- remembering last (or next) working offset for each prime across segments, instead of recomputing them using slow divisions
Putting all this together cuts the time for sieving the full 2^32 range from something like 20 seconds down to 2 seconds, well below the SPOI timeout. My pastebin has three simple C++ demo programs where you can inspect each of these optimisations in action and see their effect.
A much simpler approach is to do only the work that is necessary: sieve up to the square root of the last number of the target range to get all potential prime factors, and then sieve only the target range itself. That way you can solve to SPOJ problem in less than two dozen lines of code and a few milliseconds runtime. I just finished a demo .cpp for this type of segmented sieving (the difficult part was not the sieve but the test frame for comfortable testing, and the verification of proper operation up to 2^64-1 since there is hardly any reference data).
The sieve itself looks like this; the sieve is an odds-only packed bitmap, and the sieve range is specified in bits for robustness (it's all explained in the .cpp), so you would pass (range_start / 2) for offset
:
unsigned char odd_composites32[UINT32_MAX / (2 * CHAR_BIT) + 1]; // the small factor sieve
uintxx_t sieved_bits = 0; // how far it's been initialised
void extend_factor_sieve_to_cover (uintxx_t max_factor_bit); // bit, not number!
void sieve32 (unsigned char *target_segment, uint64_t offset, uintxx_t bit_count)
{
assert( bit_count > 0 && bit_count <= UINT32_MAX / 2 + 1 );
uintxx_t max_bit = bit_count - 1;
uint64_t max_num = 2 * (offset + max_bit) + 1;
uintxx_t max_factor_bit = (max_factor32(max_num) - 1) / 2;
if (target_segment != odd_composites32)
{
extend_factor_sieve_to_cover(max_factor_bit);
}
std::memset(target_segment, 0, std::size_t((max_bit + CHAR_BIT) / CHAR_BIT));
for (uintxx_t i = 3u >> 1; i <= max_factor_bit; ++i)
{
if (bit(odd_composites32, i)) continue;
uintxx_t n = (i << 1) + 1; // the actual prime represented by bit i (< 2^32)
uintxx_t stride = n; // == (n * 2) / 2
uint64_t start = (uint64_t(n) * n) >> 1;
uintxx_t k;
if (start >= offset)
{
k = uintxx_t(start - offset);
}
else // start < offset
{
uintxx_t before_the_segment = (offset - start) % stride;
k = before_the_segment == 0 ? 0 : stride - before_the_segment;
}
while (k <= max_bit)
{
set_bit(target_segment, k);
// k can wrap since strides go up to almost 2^32
if ((k += stride) < stride)
{
break;
}
}
}
}
For the SPOJ problem - numbers less than 2^32 - unsigned integers are sufficient for all variables (i.e. uint32_t instead of uintxx_t and uint64_t) and some things could be simplified further. Also, you can use sqrt()
instead of max_factor32()
for these small ranges.
In the demo code, extend_factor_sieve_to_cover()
does the moral equivalent of sieve32(odd_composites32, 0, max_factor_bit + 1)
in small, cache-friendly steps. For the SPOJ problem you can simply use the single sieve32() call since there are only 6541 small odd prime factors in numbers less than 2^32, which you can sieve in no time flat.
Hence the trick to solving this SPOJ problem is using segmented sieving, which cuts total runtime to a few milliseconds.
来源:https://stackoverflow.com/questions/14003876/how-do-i-efficiently-sieve-through-a-selected-range-for-prime-numbers