Unusual Speed Difference between Python and C++

后端 未结 17 2065
庸人自扰
庸人自扰 2020-12-22 21:25

I recently wrote a short algorithm to calculate happy numbers in python. The program allows you to pick an upper bound and it will determine all the happy numbers below it.

相关标签:
17条回答
  • 2020-12-22 21:47

    Here's some food for thought: If given the choice of running a 1979 algorithm for finding prime numbers in a 2009 computer or a 2009 algorithm on a 1979 computer, which would you choose?

    The new algorithm on ancient hardware would be the better choice by a huge margin. Have a look at your "helper" functions.

    0 讨论(0)
  • 2020-12-22 21:47

    With similar optimizations as PotatoSwatter I got time for 10000 numbers down from 1.063 seconds to 0.062 seconds (except I replaced itoa with standard sprintf in the original).

    With all the memory optimizations (don't pass containers by value - in C++ you have to explicitly decide whether you want a copy or a reference; move operations that allocate memory out of inner loops; if you already have the number in a char buffer, what's the point of copying it to std::string etc) I got it down to 0.532.

    The rest of the time came from using %10 to access digits, rather than converting numbers to string.

    I suppose there might be another algorithmic level optimization (numbers that you have encountered while finding a happy number are themselves also happy numbers?) but I don't know how much that gains (there is not that many happy numbers in the first place) and this optimization is not in the Python version either.

    By the way, by not using string conversion and a list to square digits, I got the Python version from 0.825 seconds down to 0.33 too.

    0 讨论(0)
  • 2020-12-22 21:49

    It looks like you're passing vectors by value to other functions. This will be a significant slowdown because the program will actually make a full copy of your vector before it passes it to your function. To get around this, pass a constant reference to the vector instead of a copy. So instead of:

    int sum(vector<int> given)

    Use:

    int sum(const vector<int>& given)

    When you do this, you'll no longer be able to use the vector::iterator because it is not constant. You'll need to replace it with vector::const_iterator.

    You can also pass in non-constant references, but in this case, you don't need to modify the parameter at all.

    0 讨论(0)
  • 2020-12-22 21:49
    
    #!/usr/bin/env python
    
    import timeit
    
    upperBound = 0
    
    def calcMain():
        known = set()
        for i in xrange(0,upperBound+1):
            next = False
            current = i
            history = set()
            while not next:
                squaresum=0
                while current > 0:
                    current, digit = divmod(current, 10)
                    squaresum += digit * digit
                current = squaresum
                if current in history:
                    next = True
                    if current == 1:
                        known.add(i)
                history.add(current)
    
    while True:
        upperBound = input("Pick an upper bound: ")
        result = timeit.Timer(calcMain).timeit(1)
        print result, "seconds.\n"
    

    I made a couple of minor changes to your original python code example that make a better than 16x improvement to the performance of the code. The changes I made took the 100,000 case from about 9.64 seconds to about 3.38 seconds.

    The major change was to make the mod 10 and accumulator changes to run in a while loop. I made a couple of other changes that improved execution time in only fractions of hundredths of seconds. The first minor change was changing the main for loop from a range list comprehension to an xrange iterator. The second minor change was substituting the set class for the list class for both the known and history variables. I also experimented with iterator comprehensions and precalculating the squares but they both had negative effects on the efficiency. I seem to be running a slower version of python or on a slower processor than some of the other contributers. I would be interest in the results of someone else's timing comparison of my python code against one of the optimized C++ versions of the same algorithm. I also tried using the python -O and -OO optimizations but they had the reverse of the intended effect.

    0 讨论(0)
  • 2020-12-22 21:50

    Here is another way that relies on memorising all the numbers already explored. I obtain a factor x4-5, which is oddly stable against DrAsik's code for 1000 and 1000000, I expected the cache to be more efficient the more numbers we were exploring. Otherwise, the same kind of classic optimizations have been applied. BTW, if the compiler accepts NRVO (/RNVO ? I never remember the exact term) or rvalue references, we wouldn't need to pass the vector as an out parameter.

    NB: micro-optimizations are still possible IMHO, and moreover the caching is naive as it allocates much more memory than really needed.

    enum Status {
        never_seen,
        being_explored,
        happy,
        unhappy
    };
    
    char const* toString[] = { "never_seen", "being_explored", "happy", "unhappy" };
    
    
    inline size_t sum_squares(size_t i) {
        size_t s = 0;
        while (i) {
            const size_t digit = i%10;
            s += digit * digit;
            i /= 10;
        }
        return s ;
    }
    
    struct Cache {
        Cache(size_t dim) : m_cache(dim, never_seen) {}
        void set(size_t n, Status status) {
            if (m_cache.size() <= n) {
                m_cache.resize(n+1, never_seen);
            }
            m_cache[n] = status;
            // std::cout << "(c[" << n << "]<-"<<toString[status] << ")";
        }
        Status operator[](size_t n) const {
            if (m_cache.size() <= n) {
                return never_seen;
            } else {
                return m_cache[n];
            }
        }
    
    private:
        std::vector<Status> m_cache;
    };
    
    void search_happy_lh(size_t upper_bound, std::vector<size_t> & happy_numbers)
    {
        happy_numbers.clear();
        happy_numbers.reserve(upper_bound); // it doesn't improve much the performances
    
        Cache cache(upper_bound+1);
        std::vector<size_t> current_stack;
    
        cache.set(1,happy);
        happy_numbers.push_back(1);
        for (size_t i = 2; i<=upper_bound ; ++i) {
            // std::cout << "\r" << i << std::flush;
            current_stack.clear();
            size_t s= i;
            while ( s != 1 && cache[s]==never_seen)
            {
                current_stack.push_back(s);
                cache.set(s, being_explored);
                s = sum_squares(s);
                // std::cout << " - " << s << std::flush;
            }
            const Status update_with = (cache[s]==being_explored ||cache[s]==unhappy) ? unhappy : happy;
            // std::cout << " => " << s << ":" << toString[update_with] << std::endl;
            for (size_t j=0; j!=current_stack.size(); ++j) {
                cache.set(current_stack[j], update_with);
            }
            if (cache[i] == happy) {
                happy_numbers.push_back(i);
            }
        }
    }
    
    0 讨论(0)
  • 2020-12-22 21:51

    Well, I also gave it a once-over. I didn't test or even compile, though.

    General rules for numerical programs:

    • Never process numbers as text. That's what makes lesser languages than Python slow, so if you do it in C, the program will be slower than Python.

    • Don't use data structures if you can avoid them. You were building an array just to add the numbers up. Better keep a running total.

    • Keep a copy of the STL reference open so you can use it rather than writing your own functions.


    void calcMain(int upperBound)
    {
        vector<int> known;
        for(int i = 0; i <= upperBound; i++)
        {
            int current = i;
            vector<int> history;
            do
            {
                squaresum = 0
                for ( ; current; current /= 10 )
                {
                    int digit = current % 10;
                    squaresum += digit * digit;
                }
                current = squaresum;
                history.push_back(current);
            } while ( ! count(history.begin(), history.end() - 1, current) );
    
            if(current == 1)
            {
                known.push_back(i);
                //cout << i << "\t";
            }
    
        }
        //cout << "\n\n";
    }
    
    0 讨论(0)
提交回复
热议问题