问题
I have a set of existing data, lets say:
sample_data = [2,2,2,2,2,2,3,3,3,3,4,4,4,4,4]
off of this sample data, i would like to generate a random set of data of a certain length. This should not be off of the sample data, but off of a distribution that was generated off of the sample data.
expected output if i wanted 5 random points:
output_data = [3.4,2.3,1.5,5.2,1.3]
回答1:
Use random.sample
:
import random
sample_data = [2,2,2,2,2,2,3,3,3,3,4,4,4,4,4]
# if you want to select 5 samples from above data
print(random.sample(sample_data, 5))
Output:
[3, 2, 2, 4, 2]
回答2:
import numpy as np
length = 3
sample_data = [2,2,2,2,2,2,3,3,3,3,4,4,4,4,4]
np.random.choice(sample_data, length, False) #Sampling without replacement
Out[287]: array([4, 4, 2])
回答3:
There's an important premise of the question that needs to be decided: what kind of distribution do you want?. Now as humans we probably can classify distribution by the shape of it, when we have enough data. But machines don't, to install an distribution type, say uniform or binomial to a new input is arbitrary. Here I'll provide a brief answer with the gold standard of statistic - normal distribution (according to Central Limit Theorem, sufficient large sample sizes converge to normal)
import numpy as np
sample_data = [2,2,2,2,2,2,3,3,3,3,4,4,4,4,4]
size = 5
new_samples = np.random.normal(np.mean(sample_data), np.std(sample_data), size)
>>> new_samples
array([ 2.01221231, 2.62772975, 1.79965428, 3.83601719, 2.44967777])
The new samples are generated by a normal distribution that assume the mean and standard deviation of the original samples.
来源:https://stackoverflow.com/questions/54484313/how-to-generate-random-data-off-of-existing-sample-data