Unusual Speed Difference between Python and C++

后端 未结 17 2056
庸人自扰
庸人自扰 2020-12-22 21:25

I recently wrote a short algorithm to calculate happy numbers in python. The program allows you to pick an upper bound and it will determine all the happy numbers below it.

相关标签:
17条回答
  • 2020-12-22 21:57

    I am not an expert at C++ optimization, but I believe the speed difference may be due to the fact that Python lists have preallocated more space at the beginning while your C++ vectors must reallocate and possibly copy every time it grows.

    As for GMan's comment about find, I believe that the Python "in" operator is also a linear search and is the same speed.

    Edit

    Also I just noticed that you rolled your own pow function. There is no need to do that and the stdlib is likely faster.

    0 讨论(0)
  • 2020-12-22 22:00

    I can see that you have quite a few heap allocations that are unnecessary

    For example:

    while(!next)
            {
                char* buffer = new char[10];
    

    This doesn't look very optimized. So, you probably want to have the array pre-allocated and using it inside your loop. This is a basic optimizing technique which is easy to spot and to do. It might become into a mess too, so be careful with that.

    You are also using the atoi() function, which I don't really know if it is really optimized. Maybe doing a modulus 10 and getting the digit might be better (you have to measure thou, I didn't test this).

    The fact that you have a linear search (inVector) might be bad. Replacing the vector data structure with a std::set might speed things up. A hash_set could do the trick too.

    But I think that the worst problem is the string and this allocation of stuff on the heap inside that loop. That doesn't look good. I would try at those places first.

    0 讨论(0)
  • 2020-12-22 22:01

    Here's a C# version:

    using System;
    using System.Collections.Generic;
    using System.Text;
    
    namespace CSharp
    {
      class Program
      {
        static void Main (string [] args)
        {
          while (true)
          {
            Console.Write ("Pick an upper bound: ");
    
            String
              input = Console.ReadLine ();
    
            uint
              upper_bound;
    
            if (uint.TryParse (input, out upper_bound))
            {
              DateTime
                start = DateTime.Now;
    
              CalcHappyNumbers (upper_bound);
    
              DateTime
                end = DateTime.Now;
    
              TimeSpan
                span = end - start;
    
              Console.WriteLine ("Time taken = " + span.TotalSeconds + " seconds.");
            }
            else
            {
              Console.WriteLine ("Error in input, unable to parse '" + input + "'.");
            }
          }
        }
    
        enum State
        {
          Happy,
          Sad,
          Unknown
        }
    
        static void CalcHappyNumbers (uint upper_bound)
        {
          SortedDictionary<uint, State>
            happy = new SortedDictionary<uint, State> ();
    
          SortedDictionary<uint, bool>
            happy_numbers = new SortedDictionary<uint, bool> ();
    
          happy [1] = State.Happy;
          happy_numbers [1] = true;
    
          for (uint current = 2 ; current < upper_bound ; ++current)
          {
            FindState (ref happy, ref happy_numbers, current);
          }
    
          //foreach (KeyValuePair<uint, bool> pair in happy_numbers)
          //{
          //  Console.Write (pair.Key.ToString () + ", ");
          //}
    
          //Console.WriteLine ("");
        }
    
        static State FindState (ref SortedDictionary<uint, State> happy, ref SortedDictionary<uint,bool> happy_numbers, uint value)
        {
          State
            current_state;
    
          if (happy.TryGetValue (value, out current_state))
          {
            if (current_state == State.Unknown)
            {
              happy [value] = State.Sad;
            }
          }
          else
          {
            happy [value] = current_state = State.Unknown;
    
            uint
              new_value = 0;
    
            for (uint i = value ; i != 0 ; i /= 10)
            {
              uint
                lsd = i % 10;
    
              new_value += lsd * lsd;
            }
    
            if (new_value == 1)
            {
              current_state = State.Happy;
            }
            else
            {
              current_state = FindState (ref happy, ref happy_numbers, new_value);
            }
    
            if (current_state == State.Happy)
            {
              happy_numbers [value] = true;
            }
    
            happy [value] = current_state;
          }
    
          return current_state;
        }
      }
    }
    

    I compared it against Dr_Asik's C++ code. For an upper bound of 100000 the C++ version ran in about 2.9 seconds and the C# version in 0.35 seconds. Both were compiled using Dev Studio 2005 using default release build options and both were executed from a command prompt.

    0 讨论(0)
  • 2020-12-22 22:02

    This is my second answer; which caches things like sum of squares for values <= 10**6:

            happy_list[sq_list[x%happy_base] + sq_list[x//happy_base]]
    

    That is,

    • the number is split into 3 digits + 3 digits
    • the precomputed table is used to get sum of squares for both parts
    • these two results are added
    • the precomputed table is consulted to get the happiness of number:

    I don't think Python version can be made much faster than that (ok, if you throw away fallback to old version, that is try: overhead, it's 10% faster).

    I think this is an excellent question which shows that, indeed,

    • things that have to be fast should be written in C
    • however, usually you don't need things to be fast (even if you needed the program to run for a day, it would be less then the combined time of programmers optimizing it)
    • it's easier and faster to write programs in Python
    • but for some problems, especially computational ones, a C++ solution, like the ones above, are actually more readable and more beautiful than an attempt to optimize Python program.

    Ok, here it goes (2nd version now...):

    #!/usr/bin/env python3
    '''Provides slower and faster versions of a function to compute happy numbers.
    
    slow_happy() implements the algorithm as in the definition of happy
    numbers (but also caches the results).
    
    happy() uses the precomputed lists of sums of squares and happy numbers
    to return result in just 3 list lookups and 3 arithmetic operations for
    numbers less than 10**6; it falls back to slow_happy() for big numbers.
    
    Utilities: digits() generator, my_timeit() context manager.
    
    '''
    
    
    from time import time  # For my_timeit.
    from random import randint # For example with random number.
    
    upperBound = 10**5  # Default value, can be overridden by user.
    
    
    class my_timeit:
        '''Very simple timing context manager.'''
    
        def __init__(self, message):
            self.message = message
            self.start = time()
    
        def __enter__(self):
            return self
    
        def __exit__(self, *data):
            print(self.message.format(time() - self.start))
    
    
    def digits(x:'nonnegative number') -> "yields number's digits":
        if not (x >= 0): raise ValueError('Number should be nonnegative')
        while x:
            yield x % 10
            x //= 10
    
    
    def slow_happy(number, known = {1}, happies = {1}) -> 'True/None':
        '''Tell if the number is happy or not, caching results.
    
        It uses two static variables, parameters known and happies; the
        first one contains known happy and unhappy numbers; the second 
        contains only happy ones.
    
        If you want, you can pass your own known and happies arguments. If
        you do, you should keep the assumption commented out on the 1 line.
    
        '''
        # This is commented out because <= is expensive.
        # assert {1} <= happies <= known 
    
        if number in known:
            return number in happies
    
        history = set()
        while True:
            history.add(number)
            number = sum(x**2 for x in digits(number))
            if number in known or number in history:
                break
    
        known.update(history)
        if number in happies:
            happies.update(history)
            return True
    
    
    # This will define new happy() to be much faster ------------------------.
    
    with my_timeit('Preparation time was {0} seconds.\n'):
    
        LogAbsoluteUpperBound = 6 # The maximum possible number is 10**this.
        happy_list = [slow_happy(x)
                      for x in range(81*LogAbsoluteUpperBound + 1)]
        happy_base = 10**((LogAbsoluteUpperBound + 1)//2)
        sq_list = [sum(d**2 for d in digits(x))
                   for x in range(happy_base + 1)]
    
        def happy(x):
            '''Tell if the number is happy, optimized for smaller numbers.
    
            This function works fast for numbers <= 10**LogAbsoluteUpperBound.
    
            '''
            try:
                return happy_list[sq_list[x%happy_base] + sq_list[x//happy_base]]
            except IndexError:
                return slow_happy(x)
    
    # End of happy()'s redefinition -----------------------------------------.
    
    
    def calcMain(print_numbers, upper_bound):
        happies = [x for x in range(upper_bound + 1) if happy(x)]
        if print_numbers:
            print(happies)
    
    
    if __name__ == '__main__':
        while True:
    
            upperBound = eval(input(
                "Pick an upper bound [{0} default, 0 ends, negative number prints]: "
                .format(upperBound)).strip() or repr(upperBound))
            if not upperBound:
                break
    
            with my_timeit('This computation took {0} seconds.'):
                calcMain(upperBound < 0, abs(upperBound))
    
            single = 0
            while not happy(single):
                single = randint(1, 10**12)
            print('FYI, {0} is {1}.\n'.format(single,
                        'happy' if happy(single) else 'unhappy')) 
    
        print('Nice to see you, goodbye!')
    
    0 讨论(0)
  • 2020-12-22 22:03

    Just to get a little more closure on this issue by seeing how fast I could truely find these numbers, I wrote a multithreaded C++ implementation of Dr_Asik's algorithm. There are two things that are important to realize about the fact that this implementation is multithreaded.

    1. More threads does not necessarily lead to better execution times, there is a happy medium for every situation depending on the volume of numbers you want to calculate.

    2. If you compare the times between this version running with one thread and the original version, the only factors that could cause a difference in time are the overhead from starting the thread and variable system performance issues. Otherwise, the algorithm is the same.

    The code for this implementation (all credit for the algorithm goes to Dr_Asik) is here. Also, I wrote some speed tests with a double check for each test to help back up those 3 points.

    Calculation of the first 100,000,000 happy numbers:

    Original - 39.061 / 39.000 (Dr_Asik's original implementation)
    1 Thread - 39.000 / 39.079
    2 Threads - 19.750 / 19.890
    10 Threads - 11.872 / 11.888
    30 Threads - 10.764 / 10.827
    50 Threads - 10.624 / 10.561 <--
    100 Threads - 11.060 / 11.216
    500 Threads - 13.385 / 12.527

    From these results it looks like our happy medium is about 50 threads, plus or minus ten or so.

    0 讨论(0)
提交回复
热议问题