How to generate random data off of existing sample data?

假如想象 提交于 2021-01-29 03:04:32

问题


I have a set of existing data, lets say:

sample_data = [2,2,2,2,2,2,3,3,3,3,4,4,4,4,4]

off of this sample data, i would like to generate a random set of data of a certain length. This should not be off of the sample data, but off of a distribution that was generated off of the sample data.

expected output if i wanted 5 random points:

output_data = [3.4,2.3,1.5,5.2,1.3]


回答1:


Use random.sample :

import random

sample_data = [2,2,2,2,2,2,3,3,3,3,4,4,4,4,4]
# if you want to select 5 samples from above data
print(random.sample(sample_data, 5))

Output:

[3, 2, 2, 4, 2]



回答2:


import numpy as np
length = 3
sample_data = [2,2,2,2,2,2,3,3,3,3,4,4,4,4,4]

np.random.choice(sample_data, length, False) #Sampling without replacement
Out[287]: array([4, 4, 2])



回答3:


There's an important premise of the question that needs to be decided: what kind of distribution do you want?. Now as humans we probably can classify distribution by the shape of it, when we have enough data. But machines don't, to install an distribution type, say uniform or binomial to a new input is arbitrary. Here I'll provide a brief answer with the gold standard of statistic - normal distribution (according to Central Limit Theorem, sufficient large sample sizes converge to normal)

import numpy as np

sample_data = [2,2,2,2,2,2,3,3,3,3,4,4,4,4,4]
size = 5
new_samples = np.random.normal(np.mean(sample_data), np.std(sample_data), size)

>>> new_samples
array([ 2.01221231,  2.62772975,  1.79965428,  3.83601719,  2.44967777])

The new samples are generated by a normal distribution that assume the mean and standard deviation of the original samples.



来源:https://stackoverflow.com/questions/54484313/how-to-generate-random-data-off-of-existing-sample-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!