I recently wrote a short algorithm to calculate happy numbers in python. The program allows you to pick an upper bound and it will determine all the happy numbers below it.
I am not an expert at C++ optimization, but I believe the speed difference may be due to the fact that Python lists have preallocated more space at the beginning while your C++ vectors must reallocate and possibly copy every time it grows.
As for GMan's comment about find, I believe that the Python "in" operator is also a linear search and is the same speed.
Edit
Also I just noticed that you rolled your own pow function. There is no need to do that and the stdlib is likely faster.
I can see that you have quite a few heap allocations that are unnecessary
For example:
while(!next)
{
char* buffer = new char[10];
This doesn't look very optimized. So, you probably want to have the array pre-allocated and using it inside your loop. This is a basic optimizing technique which is easy to spot and to do. It might become into a mess too, so be careful with that.
You are also using the atoi() function, which I don't really know if it is really optimized. Maybe doing a modulus 10 and getting the digit might be better (you have to measure thou, I didn't test this).
The fact that you have a linear search (inVector) might be bad. Replacing the vector data structure with a std::set might speed things up. A hash_set could do the trick too.
But I think that the worst problem is the string and this allocation of stuff on the heap inside that loop. That doesn't look good. I would try at those places first.
Here's a C# version:
using System;
using System.Collections.Generic;
using System.Text;
namespace CSharp
{
class Program
{
static void Main (string [] args)
{
while (true)
{
Console.Write ("Pick an upper bound: ");
String
input = Console.ReadLine ();
uint
upper_bound;
if (uint.TryParse (input, out upper_bound))
{
DateTime
start = DateTime.Now;
CalcHappyNumbers (upper_bound);
DateTime
end = DateTime.Now;
TimeSpan
span = end - start;
Console.WriteLine ("Time taken = " + span.TotalSeconds + " seconds.");
}
else
{
Console.WriteLine ("Error in input, unable to parse '" + input + "'.");
}
}
}
enum State
{
Happy,
Sad,
Unknown
}
static void CalcHappyNumbers (uint upper_bound)
{
SortedDictionary<uint, State>
happy = new SortedDictionary<uint, State> ();
SortedDictionary<uint, bool>
happy_numbers = new SortedDictionary<uint, bool> ();
happy [1] = State.Happy;
happy_numbers [1] = true;
for (uint current = 2 ; current < upper_bound ; ++current)
{
FindState (ref happy, ref happy_numbers, current);
}
//foreach (KeyValuePair<uint, bool> pair in happy_numbers)
//{
// Console.Write (pair.Key.ToString () + ", ");
//}
//Console.WriteLine ("");
}
static State FindState (ref SortedDictionary<uint, State> happy, ref SortedDictionary<uint,bool> happy_numbers, uint value)
{
State
current_state;
if (happy.TryGetValue (value, out current_state))
{
if (current_state == State.Unknown)
{
happy [value] = State.Sad;
}
}
else
{
happy [value] = current_state = State.Unknown;
uint
new_value = 0;
for (uint i = value ; i != 0 ; i /= 10)
{
uint
lsd = i % 10;
new_value += lsd * lsd;
}
if (new_value == 1)
{
current_state = State.Happy;
}
else
{
current_state = FindState (ref happy, ref happy_numbers, new_value);
}
if (current_state == State.Happy)
{
happy_numbers [value] = true;
}
happy [value] = current_state;
}
return current_state;
}
}
}
I compared it against Dr_Asik's C++ code. For an upper bound of 100000 the C++ version ran in about 2.9 seconds and the C# version in 0.35 seconds. Both were compiled using Dev Studio 2005 using default release build options and both were executed from a command prompt.
This is my second answer; which caches things like sum of squares for values <= 10**6
:
happy_list[sq_list[x%happy_base] + sq_list[x//happy_base]]
That is,
I don't think Python version can be made much faster than that (ok, if you throw away fallback to old version, that is try:
overhead, it's 10% faster).
I think this is an excellent question which shows that, indeed,
Ok, here it goes (2nd version now...):
#!/usr/bin/env python3
'''Provides slower and faster versions of a function to compute happy numbers.
slow_happy() implements the algorithm as in the definition of happy
numbers (but also caches the results).
happy() uses the precomputed lists of sums of squares and happy numbers
to return result in just 3 list lookups and 3 arithmetic operations for
numbers less than 10**6; it falls back to slow_happy() for big numbers.
Utilities: digits() generator, my_timeit() context manager.
'''
from time import time # For my_timeit.
from random import randint # For example with random number.
upperBound = 10**5 # Default value, can be overridden by user.
class my_timeit:
'''Very simple timing context manager.'''
def __init__(self, message):
self.message = message
self.start = time()
def __enter__(self):
return self
def __exit__(self, *data):
print(self.message.format(time() - self.start))
def digits(x:'nonnegative number') -> "yields number's digits":
if not (x >= 0): raise ValueError('Number should be nonnegative')
while x:
yield x % 10
x //= 10
def slow_happy(number, known = {1}, happies = {1}) -> 'True/None':
'''Tell if the number is happy or not, caching results.
It uses two static variables, parameters known and happies; the
first one contains known happy and unhappy numbers; the second
contains only happy ones.
If you want, you can pass your own known and happies arguments. If
you do, you should keep the assumption commented out on the 1 line.
'''
# This is commented out because <= is expensive.
# assert {1} <= happies <= known
if number in known:
return number in happies
history = set()
while True:
history.add(number)
number = sum(x**2 for x in digits(number))
if number in known or number in history:
break
known.update(history)
if number in happies:
happies.update(history)
return True
# This will define new happy() to be much faster ------------------------.
with my_timeit('Preparation time was {0} seconds.\n'):
LogAbsoluteUpperBound = 6 # The maximum possible number is 10**this.
happy_list = [slow_happy(x)
for x in range(81*LogAbsoluteUpperBound + 1)]
happy_base = 10**((LogAbsoluteUpperBound + 1)//2)
sq_list = [sum(d**2 for d in digits(x))
for x in range(happy_base + 1)]
def happy(x):
'''Tell if the number is happy, optimized for smaller numbers.
This function works fast for numbers <= 10**LogAbsoluteUpperBound.
'''
try:
return happy_list[sq_list[x%happy_base] + sq_list[x//happy_base]]
except IndexError:
return slow_happy(x)
# End of happy()'s redefinition -----------------------------------------.
def calcMain(print_numbers, upper_bound):
happies = [x for x in range(upper_bound + 1) if happy(x)]
if print_numbers:
print(happies)
if __name__ == '__main__':
while True:
upperBound = eval(input(
"Pick an upper bound [{0} default, 0 ends, negative number prints]: "
.format(upperBound)).strip() or repr(upperBound))
if not upperBound:
break
with my_timeit('This computation took {0} seconds.'):
calcMain(upperBound < 0, abs(upperBound))
single = 0
while not happy(single):
single = randint(1, 10**12)
print('FYI, {0} is {1}.\n'.format(single,
'happy' if happy(single) else 'unhappy'))
print('Nice to see you, goodbye!')
Just to get a little more closure on this issue by seeing how fast I could truely find these numbers, I wrote a multithreaded C++ implementation of Dr_Asik's algorithm. There are two things that are important to realize about the fact that this implementation is multithreaded.
More threads does not necessarily lead to better execution times, there is a happy medium for every situation depending on the volume of numbers you want to calculate.
If you compare the times between this version running with one thread and the original version, the only factors that could cause a difference in time are the overhead from starting the thread and variable system performance issues. Otherwise, the algorithm is the same.
The code for this implementation (all credit for the algorithm goes to Dr_Asik) is here. Also, I wrote some speed tests with a double check for each test to help back up those 3 points.
Calculation of the first 100,000,000 happy numbers:
Original - 39.061 / 39.000 (Dr_Asik's original implementation)
1 Thread - 39.000 / 39.079
2 Threads - 19.750 / 19.890
10 Threads - 11.872 / 11.888
30 Threads - 10.764 / 10.827
50 Threads - 10.624 / 10.561 <--
100 Threads - 11.060 / 11.216
500 Threads - 13.385 / 12.527
From these results it looks like our happy medium is about 50 threads, plus or minus ten or so.