Unusual Speed Difference between Python and C++

后端未结

关注

 17  2066

I recently wrote a short algorithm to calculate happy numbers in python. The program allows you to pick an upper bound and it will determine all the happy numbers below it.

相关标签:

17条回答

陌清茗

2020-12-22 21:57

I am not an expert at C++ optimization, but I believe the speed difference may be due to the fact that Python lists have preallocated more space at the beginning while your C++ vectors must reallocate and possibly copy every time it grows.

As for GMan's comment about find, I believe that the Python "in" operator is also a linear search and is the same speed.

Edit

Also I just noticed that you rolled your own pow function. There is no need to do that and the stdlib is likely faster.

0 讨论(0)
发布评论:

提交评论
- 加载中...
刺人心

2020-12-22 22:00
I can see that you have quite a few heap allocations that are unnecessary

For example:
```
while(!next)
        {
            char* buffer = new char[10];
```
This doesn't look very optimized. So, you probably want to have the array pre-allocated and using it inside your loop. This is a basic optimizing technique which is easy to spot and to do. It might become into a mess too, so be careful with that.

You are also using the atoi() function, which I don't really know if it is really optimized. Maybe doing a modulus 10 and getting the digit might be better (you have to measure thou, I didn't test this).

The fact that you have a linear search (inVector) might be bad. Replacing the vector data structure with a std::set might speed things up. A hash_set could do the trick too.

But I think that the worst problem is the string and this allocation of stuff on the heap inside that loop. That doesn't look good. I would try at those places first.
0 讨论(0)
发布评论:

提交评论
- 加载中...

無奈伤痛

2020-12-22 22:01

Here's a C# version:

using System;
using System.Collections.Generic;
using System.Text;

namespace CSharp
{
  class Program
  {
    static void Main (string [] args)
    {
      while (true)
      {
        Console.Write ("Pick an upper bound: ");

        String
          input = Console.ReadLine ();

        uint
          upper_bound;

        if (uint.TryParse (input, out upper_bound))
        {
          DateTime
            start = DateTime.Now;

          CalcHappyNumbers (upper_bound);

          DateTime
            end = DateTime.Now;

          TimeSpan
            span = end - start;

          Console.WriteLine ("Time taken = " + span.TotalSeconds + " seconds.");
        }
        else
        {
          Console.WriteLine ("Error in input, unable to parse '" + input + "'.");
        }
      }
    }

    enum State
    {
      Happy,
      Sad,
      Unknown
    }

    static void CalcHappyNumbers (uint upper_bound)
    {
      SortedDictionary<uint, State>
        happy = new SortedDictionary<uint, State> ();

      SortedDictionary<uint, bool>
        happy_numbers = new SortedDictionary<uint, bool> ();

      happy [1] = State.Happy;
      happy_numbers [1] = true;

      for (uint current = 2 ; current < upper_bound ; ++current)
      {
        FindState (ref happy, ref happy_numbers, current);
      }

      //foreach (KeyValuePair<uint, bool> pair in happy_numbers)
      //{
      //  Console.Write (pair.Key.ToString () + ", ");
      //}

      //Console.WriteLine ("");
    }

    static State FindState (ref SortedDictionary<uint, State> happy, ref SortedDictionary<uint,bool> happy_numbers, uint value)
    {
      State
        current_state;

      if (happy.TryGetValue (value, out current_state))
      {
        if (current_state == State.Unknown)
        {
          happy [value] = State.Sad;
        }
      }
      else
      {
        happy [value] = current_state = State.Unknown;

        uint
          new_value = 0;

        for (uint i = value ; i != 0 ; i /= 10)
        {
          uint
            lsd = i % 10;

          new_value += lsd * lsd;
        }

        if (new_value == 1)
        {
          current_state = State.Happy;
        }
        else
        {
          current_state = FindState (ref happy, ref happy_numbers, new_value);
        }

        if (current_state == State.Happy)
        {
          happy_numbers [value] = true;
        }

        happy [value] = current_state;
      }

      return current_state;
    }
  }
}

I compared it against Dr_Asik's C++ code. For an upper bound of 100000 the C++ version ran in about 2.9 seconds and the C# version in 0.35 seconds. Both were compiled using Dev Studio 2005 using default release build options and both were executed from a command prompt.

0 讨论(0)

旧巷少年郎

2020-12-22 22:02

This is my second answer; which caches things like sum of squares for values <= 10**6:

        happy_list[sq_list[x%happy_base] + sq_list[x//happy_base]]

That is,

the number is split into 3 digits + 3 digits
the precomputed table is used to get sum of squares for both parts
these two results are added
the precomputed table is consulted to get the happiness of number:

I don't think Python version can be made much faster than that (ok, if you throw away fallback to old version, that is try: overhead, it's 10% faster).

I think this is an excellent question which shows that, indeed,

things that have to be fast should be written in C
however, usually you don't need things to be fast (even if you needed the program to run for a day, it would be less then the combined time of programmers optimizing it)
it's easier and faster to write programs in Python
but for some problems, especially computational ones, a C++ solution, like the ones above, are actually more readable and more beautiful than an attempt to optimize Python program.

Ok, here it goes (2nd version now...):

#!/usr/bin/env python3
'''Provides slower and faster versions of a function to compute happy numbers.

slow_happy() implements the algorithm as in the definition of happy
numbers (but also caches the results).

happy() uses the precomputed lists of sums of squares and happy numbers
to return result in just 3 list lookups and 3 arithmetic operations for
numbers less than 10**6; it falls back to slow_happy() for big numbers.

Utilities: digits() generator, my_timeit() context manager.

'''


from time import time  # For my_timeit.
from random import randint # For example with random number.

upperBound = 10**5  # Default value, can be overridden by user.


class my_timeit:
    '''Very simple timing context manager.'''

    def __init__(self, message):
        self.message = message
        self.start = time()

    def __enter__(self):
        return self

    def __exit__(self, *data):
        print(self.message.format(time() - self.start))


def digits(x:'nonnegative number') -> "yields number's digits":
    if not (x >= 0): raise ValueError('Number should be nonnegative')
    while x:
        yield x % 10
        x //= 10


def slow_happy(number, known = {1}, happies = {1}) -> 'True/None':
    '''Tell if the number is happy or not, caching results.

    It uses two static variables, parameters known and happies; the
    first one contains known happy and unhappy numbers; the second 
    contains only happy ones.

    If you want, you can pass your own known and happies arguments. If
    you do, you should keep the assumption commented out on the 1 line.

    '''
    # This is commented out because <= is expensive.
    # assert {1} <= happies <= known 

    if number in known:
        return number in happies

    history = set()
    while True:
        history.add(number)
        number = sum(x**2 for x in digits(number))
        if number in known or number in history:
            break

    known.update(history)
    if number in happies:
        happies.update(history)
        return True


# This will define new happy() to be much faster ------------------------.

with my_timeit('Preparation time was {0} seconds.\n'):

    LogAbsoluteUpperBound = 6 # The maximum possible number is 10**this.
    happy_list = [slow_happy(x)
                  for x in range(81*LogAbsoluteUpperBound + 1)]
    happy_base = 10**((LogAbsoluteUpperBound + 1)//2)
    sq_list = [sum(d**2 for d in digits(x))
               for x in range(happy_base + 1)]

    def happy(x):
        '''Tell if the number is happy, optimized for smaller numbers.

        This function works fast for numbers <= 10**LogAbsoluteUpperBound.

        '''
        try:
            return happy_list[sq_list[x%happy_base] + sq_list[x//happy_base]]
        except IndexError:
            return slow_happy(x)

# End of happy()'s redefinition -----------------------------------------.


def calcMain(print_numbers, upper_bound):
    happies = [x for x in range(upper_bound + 1) if happy(x)]
    if print_numbers:
        print(happies)


if __name__ == '__main__':
    while True:

        upperBound = eval(input(
            "Pick an upper bound [{0} default, 0 ends, negative number prints]: "
            .format(upperBound)).strip() or repr(upperBound))
        if not upperBound:
            break

        with my_timeit('This computation took {0} seconds.'):
            calcMain(upperBound < 0, abs(upperBound))

        single = 0
        while not happy(single):
            single = randint(1, 10**12)
        print('FYI, {0} is {1}.\n'.format(single,
                    'happy' if happy(single) else 'unhappy')) 

    print('Nice to see you, goodbye!')

0 讨论(0)

青春惊慌失措

2020-12-22 22:03
Just to get a little more closure on this issue by seeing how fast I could truely find these numbers, I wrote a multithreaded C++ implementation of Dr_Asik's algorithm. There are two things that are important to realize about the fact that this implementation is multithreaded.
1. More threads does not necessarily lead to better execution times, there is a happy medium for every situation depending on the volume of numbers you want to calculate.
2. If you compare the times between this version running with one thread and the original version, the only factors that could cause a difference in time are the overhead from starting the thread and variable system performance issues. Otherwise, the algorithm is the same.
The code for this implementation (all credit for the algorithm goes to Dr_Asik) is here. Also, I wrote some speed tests with a double check for each test to help back up those 3 points.

Calculation of the first 100,000,000 happy numbers:

Original - 39.061 / 39.000 (Dr_Asik's original implementation)
1 Thread - 39.000 / 39.079
2 Threads - 19.750 / 19.890
10 Threads - 11.872 / 11.888
30 Threads - 10.764 / 10.827
50 Threads - 10.624 / 10.561 <--
100 Threads - 11.060 / 11.216
500 Threads - 13.385 / 12.527

From these results it looks like our happy medium is about 50 threads, plus or minus ten or so.
0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2 3