How to calculate mean, mode, variance, standard deviation etc. of output in python?

问题

I have a simple game which is based on probabilities, every day we toss a coin and if we get heads then we win and we get $20 and if we toss the coin and we get tails then we lose $19, at the end of the month (28 days) we see how much we have lost or made.

def coin_tossing_game():
    random_numbers = [random.randint(0, 1) for x in range(500)] #generate 500 random numbers
    for x in random_numbers:
        if x == 0: #if we get heads
            return 20 #we win $20
        elif x == 1: #if we get tails
            return -19 #we lose $19


for a in range(1, 28): #for each day of the month
    print(coin_tossing_game())

This returns the output 20 20 -19 -19 -19 -19 -19 20 -19 20 -19 20 -19 20 20 -19 -19 20 20 -19 -19 -19 20 20 20 -19 -19 -19 20 20

This output is exactly what I expected. I want to find the sum of the output and other descriptive statistics like the mean, mode, median, standard deviation, confidence intervals etc. I have had to copy and paste this data to excel to do this data analysis. I was hoping there was a way to easily do this in python quickly.

回答1:

You're asking how. The most immediately available is build into Python in the form of the statistics library. But again, you seem to want to know how to do this. The following code shows the basics, which I haven't felt the need to do for almost 50 years.

First, modify your code so that it captures the sample in a vector. In my code it's called sample.

The first part of the code simply exercises the Python library. No sweat there.

The second part of the code shows how to accumulate the sum of the values in the sample, and the sum of the squares of their deviations from the mean. I leave it to you to work out how to calculate the sample variance, sample standard deviation and confidence intervals under the usual assumptions from these statistics. Having sorted and renamed the sample I calculate the maximum and minimum values (useful for estimation for some distributions). Finally I calculate the median from the sorted sample. I leave calculation of the median to you.

import random

def coin_tossing_game():
    random_numbers = [random.randint(0, 1) for x in range(500)] #generate 500 random numbers
    for x in random_numbers:
        if x == 0: #if we get heads
            return 20 #we win $20
        elif x == 1: #if we get tails
            return -19 #we lose $19

sample = []
for a in range(1, 28): #for each day of the month
    #~ print(coin_tossing_game())
    sample.append(coin_tossing_game())

## the easy way

import statistics

print (statistics.mean(sample))
print (statistics.median(sample))
print (statistics.mode(sample))
print (statistics.stdev(sample))
print (statistics.variance(sample))

## the hard way

sample.sort()
orderedSample = sample
N = len(sample)
minSample = orderedSample[0]
maxSample = orderedSample[-1]
sumX = 0
for x in sample:
    sumX += x
mean = sumX / N

sumDeviates2 = 0
for x in sample:
    sumDeviates2 += ( x-mean )**2

k = N//2
if N%2==0:
    mode = 0.5* (orderedSample[k]+orderedSample[k-1])
else:
    mode = orderedSample[k]

回答2:

Yes, there is: Install numpy and scipy. Use functions numpy.mean, numpy.std, numpy.median, scipy.stats.mode.

Scipy also contains the scipy.stats module which provides various common significance tests.

回答3:

Use the scipy stats module and use modal for the mode, use scipy.stats.mstats.median_cihs for the median, and use trim_mean for the mean. You could also use the statistics module and use the mean(), median(), and mode() functions.

来源：https://stackoverflow.com/questions/42177007/how-to-calculate-mean-mode-variance-standard-deviation-etc-of-output-in-pyth

标签

python

random

simulation

python-3.5

montecarlo