Why is my algorithm for finding the sum of all prime numbers below 2 million so slow? I\'m a fairly beginner programmer and this is what I came up with for finding the solution:
Nobody pointed this out, but using range
in Python 2.x is very slow. Use xrange
instaed, in this case this should give you a huge performance advantage.
See this question.
Also, you don't have to loop until the number you check, checking until round(sqrt(n)) + 1
is sufficient. (If the number greater than its square divides it, there's a number smaller than the square that you must have already noticed.)
import time
start = time.time()
def is_prime(num):
prime = True
for i in range(2,int(num**0.5)+1):
if num % i == 0:
prime = False
break
return prime
sum_prime = 0
for i in range(2,2000000):
if is_prime(i):
sum_prime += i
print("sum: ",sum_prime)
elapsed = (time.time() - start)
print("This code took: " + str(elapsed) + " seconds")
Your algorithm uses trial division, which is very slow. A better algorithm uses the Sieve of Eratosthenes:
def sumPrimes(n):
sum, sieve = 0, [True] * n
for p in range(2, n):
if sieve[p]:
sum += p
for i in range(p*p, n, p):
sieve[i] = False
return sum
print sumPrimes(2000000)
That should run in less than a second. If you're interested in programming with prime numbers, I modestly recommend this essay at my blog.
You need to use prime sieve check out eratostheneses sieve and try to implement it in code.
Trial division is very inefficient for finding primes because it has complexity n square, the running time grows very fast. This task is meant to teach you how to find something better.
First of all, I think you can split your code by defining a function. However, there is a drawback of using a regular function in this case because every time a normal function return
a value, the next call to the function will execute the complete code inside the function again. Since you are iterating 2 million times, it would be better to:
yield
command instead of return
.yield
instruction instead of going over the whole function again.I recommend you to have a look at this article about generators in python. It provides a more extensive explanation for this example.
The solution would be something like this:
import math
# Check if a number is prime
def is_prime(number):
if number > 1:
if number == 2:
return True
if number % 2 == 0:
return False
for current in range(3, int(math.sqrt(number) + 1), 2):
if number % current == 0:
return False
return True
return False
# Get the next after a given number
def get_primes(number):
while True:
if is_prime(number):
yield number
# Next call to the function will continue here!
number += 1
# Get the sum of all prime numbers under a number
def sum_primes_under(limit):
total = 2
for next_prime in get_primes(3):
if next_prime < limit:
total += next_prime
else:
print(total)
return
# Call the function
sum_primes_under(2000000)
This question gives output quite very fast when you use sieve of eratosthenes Link to it. You can make it even more faster with a little modification like iterating the whole 2 million numbers just half times by considering only the odd numbers. This way you can save lots of time.
n = 2000000
ar = [False for x in range(n)]
sum = 2
def mul(a):
i = 2;p = i*a
while (p < n):
ar[p] = 1
++i
p = i*a
while (x < n):
if(ar[x] == 0):
sum += x;mul(x)
x += 2
print (sum)
Here you can see the same algorithm in c++:-
#include<bits/stdc++.h>
using namespace std;
const int n = 2000000;
bool ar[n];
void mul(int a)
{
int i = 2;int p = i*a;
while(p < n)
{
ar[p] = 1;
++i;p = i*a;
}
}
long long sieve()
{
long long sum = 2;
for(int i = 3;i < n;i += 2)
{
if(ar[i] == 0)
sum += i,mul(i);
}
return sum;
}
int main()
{
cout<<sieve();
return 0;
}
C++ works around 10 times faster than python anyways and for this algorithm too.