Why is my python 3 implementation much faster than the one I wrote in C++?

那年仲夏 提交于 2019-12-12 03:28:03

问题


I know that C++ should be much faster than Python 3 because it is a compiled language as opposed to an interpreted language.

I wrote 2 two programs that use the Monte Carlo Simulation to calculate Pi, one in Python 3 and the other in C++.

Python turned out to be approximately 16x faster than C++. As seen in the photos bellow, with a repetition value of (10,000,000), Python takes 8.5 seconds whilst C++ takes 137.4 seconds.

I'm new to C++ but I can't find posts online that explains this behavior.

According to this post C++ in general should be 10x - 100x faster than Python, which is clearly not the case with me.

Please help me understand why Python is significantly faster than C++ in my case.

My results:

Monte Carlo Simulation (Estimation of Pi) in C++

Monte Carlo Simulation (Estimation of Pi) in Python 3

Python Source Code:

from random import random
import time
import sys

class MonteCarloSimulator(object):

    def __init__(self, value):
        self.value = value

        if sys.platform == "win32":
            self.G = ''
            self.R = ''
            self.END = ''
        else:
            self.G = '\033[92m'
            self.R = '\033[1;31m'
            self.END = '\033[0m'

    def unit_circle(self, x, y):
        if (x ** 2 + y ** 2) <= 1:
            return True
        else:
            return False

    def simulate(self):
        print("\nProcessing calculations with a repetition value of " + self.R +
        str(self.value) + self.END + " times.")

        area_of_circle = 0
        area_of_square = 0

        start = time.clock()

        for i in range(1, self.value):
            x = random()
            y = random()

            if self.unit_circle(x, y):
                area_of_circle += 1
            area_of_square += 1

        pi = (area_of_circle * 4) / area_of_square

        runtime = time.clock() - start

        print("\tCalculated Pi = " + self.G + str(pi) + self.END +
        " ({0} seconds, {1} minutes)".format(round(runtime, 10),
        round(runtime / 60, 10)))

        print("Estimated Num of Pi is off by", abs(pi - 3.14159265359))

def main():
    values = [1000, 10000, 100000, 1000000, 10000000, 100000000,1000000000, 10000000000]
    for value in values: MonteCarloSimulator(value).simulate()
if __name__ == "__main__":
    try:
        main()
    except KeyboardInterrupt:
        print("\nQuitting...")
        sys.exit(1)

C++ Source Code:

#include <iostream>                     // std library
#include <random>                       // random number generator
#include <ctime>                        // calculating runtime
#include <cmath>                        // absolute value function
#include "MonteCarloSimmulation.hpp"    // function prototypes

using namespace std;

const double g_PI {3.141592653589793238463};

int main()
{
    // repitition values
    long values[5] = {1000, 10000, 100000, 1000000, 10000000};//, 100000000, 1000000000, 10000000000};

    // runs the simulation with the different repetition values
    for (auto value : values)
        simulate(value);

    cout << "\nPress return to exit";
    cin.get();

    return 0;
}

/**
 * The actual simulation
 */
void simulate(unsigned long value)
{
    // start time for calculating runtime
    const clock_t startTime = clock();

    // area's variables
    unsigned long area_of_circle = 0;
    unsigned long area_of_square = 0;

    // print the repitiion value
    cout << "\nProcessing calculations with a repetition value of " << value <<
    " times." << endl;

    for (unsigned long i = 0; i != value; i++)
    {
        // gets random values from 0 to 1 for (x) and (y)
        float x = randomFloat();
        float y = randomFloat();

        // checks if (x, y) are in a unit circle, if so increment circle area
        if (unit_circle(x, y))
            area_of_circle++;
        area_of_square++;
    }

    // pi = area of circle * 4 / area of square
    double calculatedPi = static_cast<double>(area_of_circle * 4) / area_of_square;

    float endTime = static_cast<float>(clock() - startTime) / CLOCKS_PER_SEC;

    // prints the value of calculated pi
    cout << "\tCalculated Value of Pi: " << calculatedPi <<
    " (" << endTime << " seconds, " << endTime/60 << " minutes)" << endl;

    // difference between the calc value and pi
    cout << "Estimated Num of Pi is off by " << abs(calculatedPi - g_PI) << '\n';
}

/**
 * returns a random number from 0 to 1
 */
float randomFloat()
{
    random_device rd;
    default_random_engine generator(rd()); // rd() provides a random seed
    uniform_real_distribution<float> distribution(0,1);

    float x = distribution(generator);

    return x;
}

/**
 * checks if the two input parameters are inside a unit circle
 */
bool unit_circle(float x, float y)
{
    if ((x*x + y*y) <= 1)
        return true;
    else
        return false;
}

回答1:


The main problem is that you're reseeding a random number generator for each random number in your C++ code. Additionally you're not compiling with optimizations enabled (-O3).

I moved the initialization of the random number generator outside the randomFloat function (equally, you could use static variables inside the function):

random_device rd;
default_random_engine generator(rd()); // rd() provides a random seed
uniform_real_distribution<float> distribution(0,1);

float randomFloat() {
    float x = distribution(generator);
    return x;
}

and compiled with -O3 and now C++ is considerably faster than Python


Another possibility could be that python and C++ code use a different random number generator. Python random module (C code here) uses a MT19937 Mersenne Twister random number generator that is a fast PRNG optimized specifically for numerical problems such as Monte Carlo; the algorithm of default_random_engine in C++ is implementation-defined. As pointed out by Melak47, you can force the use of MT19937 PRNG in C++ with:

mt19937 generator(rd());

or

mt19937_64 generator(rd());

P.S., Python outperforming C++ is not unheard of; the C++ algorithms value genericity whereas the Python algorithms are often quite optimized for some use cases. See for example this question on substring matching.




回答2:


The main cost is your randomFloat() c++ method.

building a random_device, default_random_engine and uniform_real_distribution every iteration is incredibly wasteful.

By making these static I was able to increase the speed of the c++ implementation by over a factor of 100. But you'd be better served injecting them, or wrapping this in a class and making them instance members.

#include <iostream>                     // std library
#include <random>                       // random number generator
#include <ctime>                        // calculating runtime
#include <cmath>                        // absolute value function

using namespace std;

const double g_PI {3.141592653589793238463};

void simulate(unsigned long value);
float randomFloat();
bool unit_circle(float x, float y);

int main()
{
    // repitition values
    long values[5] = {1000, 10000, 100000, 1000000, 10000000};//, 100000000, 1000000000, 10000000000};

    // runs the simulation with the different repetition values
    for (auto value : values)
        simulate(value);

    cout << "\nPress return to exit";
    cin.get();

    return 0;
}

/**
 * The actual simulation
 */
void simulate(unsigned long value)
{
    // start time for calculating runtime
    const clock_t startTime = clock();

    // area's variables
    unsigned long area_of_circle = 0;
    unsigned long area_of_square = 0;

    // print the repitiion value
    cout << "\nProcessing calculations with a repetition value of " << value <<
    " times." << endl;

    for (unsigned long i = 0; i != value; i++)
    {
        // gets random values from 0 to 1 for (x) and (y)
        float x = randomFloat();
        float y = randomFloat();

        // checks if (x, y) are in a unit circle, if so increment circle area
        if (unit_circle(x, y))
            area_of_circle++;
        area_of_square++;
    }

    // pi = area of circle * 4 / area of square
    double calculatedPi = static_cast<double>(area_of_circle * 4) / area_of_square;

    float endTime = static_cast<float>(clock() - startTime) / CLOCKS_PER_SEC;

    // prints the value of calculated pi
    cout << "\tCalculated Value of Pi: " << calculatedPi <<
    " (" << endTime << " seconds, " << endTime/60 << " minutes)" << endl;

    // difference between the calc value and pi
    cout << "Estimated Num of Pi is off by " << abs(calculatedPi - g_PI) << '\n';
}

/**
 * returns a random number from 0 to 1
 */
float randomFloat()
{
    static random_device rd;
    static default_random_engine generator(rd()); // rd() provides a random seed
    static uniform_real_distribution<float> distribution(0,1);

    float x = distribution(generator);

    return x;
}

/**
 * checks if the two input parameters are inside a unit circle
 */
bool unit_circle(float x, float y)
{
    if ((x*x + y*y) <= 1)
        return true;
    else
        return false;
}

Original implmentation Log

Processing calculations with a repetition value of 1000 times.
    Calculated Value of Pi: 3.08 (0.019227 seconds, 0.00032045 minutes)
Estimated Num of Pi is off by 0.0615927

Processing calculations with a repetition value of 10000 times.
    Calculated Value of Pi: 3.124 (0.162044 seconds, 0.00270073 minutes)
Estimated Num of Pi is off by 0.0175927

Processing calculations with a repetition value of 100000 times.
    Calculated Value of Pi: 3.14568 (1.72181 seconds, 0.0286968 minutes)
Estimated Num of Pi is off by 0.00408735

//Couldn't be bothered to wait :P

Using static random generator

Processing calculations with a repetition value of 1000 times.
    Calculated Value of Pi: 3.136 (0.000144 seconds, 2.4e-06 minutes)
Estimated Num of Pi is off by 0.00559265

Processing calculations with a repetition value of 10000 times.
    Calculated Value of Pi: 3.1824 (0.000596 seconds, 9.93333e-06 minutes)
Estimated Num of Pi is off by 0.0408073

Processing calculations with a repetition value of 100000 times.
    Calculated Value of Pi: 3.14044 (0.005889 seconds, 9.815e-05 minutes)
Estimated Num of Pi is off by 0.00115265

Processing calculations with a repetition value of 1000000 times.
    Calculated Value of Pi: 3.14278 (0.058896 seconds, 0.0009816 minutes)
Estimated Num of Pi is off by 0.00118335

Processing calculations with a repetition value of 10000000 times.
    Calculated Value of Pi: 3.14165 (0.589034 seconds, 0.00981723 minutes)
Estimated Num of Pi is off by 6.09464e-05



回答3:


Not meant as an answer to your question why python is faster, just to show that python can get event faster and neater for this problem.

To possibilities to speed things up in python:

Use numpy vectorization:

import numpy as np

def pi(N):
    x, y = np.random.uniform(-1, 1, size=(2, N))
    in_circle = np.count_nonzero(x**2 + y**2 <= 1)
    return 4 * in_circle / N

And / or numba just in time compilation:

from numba import jit
import random

@jit
def pi(N):
    in_circle = 0
    for i in range(N):
        x = 2 * random.random() - 1
        y = 2 * random.random() - 1

        if x**2 + y**2 <= 1:
            in_circle += 1
     return 4 * in_circle / N


来源:https://stackoverflow.com/questions/37254061/why-is-my-python-3-implementation-much-faster-than-the-one-i-wrote-in-c

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!