问题
I need to add some 'noise' to my data, so I would like to add a different random number to every cell in my pandas dataframe. This code works, but seems unpythonic. Is there a better way?
import pandas as pd
import numpy as np
df = pd.DataFrame(0.0, index=[1,2,3,4,5], columns=list('ABC') )
print df
for x,line in df.iterrows():
for col in df:
line[col] = line[col] + (np.random.rand()-0.5)/1000.0
print df
回答1:
df + np.random.rand(*df.shape) / 10000.0
OR
Let's use applymap:
df = pd.DataFrame(1.0, index=[1,2,3,4,5], columns=list('ABC') )
df.applymap(lambda x: x + np.random.rand()/10000.0)
output:
A \
1 [[1.00006953418, 1.00009164785, 1.00003177706]...
2 [[1.00007291245, 1.00004186046, 1.00006935173]...
3 [[1.00000490127, 1.0000633115, 1.00004117181],...
4 [[1.00007159622, 1.0000559506, 1.00007038891],...
5 [[1.00000980335, 1.00004760836, 1.00004214422]...
B \
1 [[1.00000320322, 1.00006981682, 1.00008912557]...
2 [[1.00007443802, 1.00009270815, 1.00007225764]...
3 [[1.00001371778, 1.00001512412, 1.00007986851]...
4 [[1.00005883343, 1.00007936509, 1.00009523334]...
5 [[1.00009329606, 1.00003174878, 1.00006187704]...
C
1 [[1.00005894836, 1.00006592776, 1.0000171843],...
2 [[1.00009085391, 1.00006606979, 1.00001755092]...
3 [[1.00009736701, 1.00007240762, 1.00004558753]...
4 [[1.00003981393, 1.00007505714, 1.00007209959]...
5 [[1.0000031608, 1.00009372917, 1.00001960112],...
回答2:
This would be the more succinct method and equivalent:
In [147]:
df = pd.DataFrame((np.random.rand(5,3) - 0.5)/1000.0, columns=list('ABC'))
df
Out[147]:
A B C
0 0.000381 -0.000167 0.000020
1 0.000482 0.000007 -0.000281
2 -0.000032 -0.000402 -0.000251
3 -0.000037 -0.000319 0.000260
4 -0.000035 0.000178 0.000166
If you're doing this to an existing df with non-zero values then add
:
In [149]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('ABC'))
df
Out[149]:
A B C
0 -1.705644 0.149067 0.835378
1 -0.956335 -0.586120 0.212981
2 0.550727 -0.401768 1.421064
3 0.348885 0.879210 0.136858
4 0.271063 0.132579 1.233789
In [154]:
df.add((np.random.rand(df.shape[0], df.shape[1]) - 0.5)/1000.0)
Out[154]:
A B C
0 -1.705459 0.148671 0.835761
1 -0.956745 -0.586382 0.213339
2 0.550368 -0.401651 1.421515
3 0.348938 0.878923 0.136914
4 0.270864 0.132864 1.233622
回答3:
For nonzero data:
df + (np.random.rand(df.shape)-0.5)*0.001
OR
df + np.random.uniform(-0.01,0.01,(df.shape)))
For cases where your data frame contains zeros that you wish to keep as zero:
df * (1 + (np.random.rand(df.shape)-0.5)*0.001)
OR
df * (1 + np.random.uniform(-0.01,0.01,(df.shape)))
I think either of these should work, its a case of generating a same size "dataframe" (or perhaps array of arrays) as your existing df and adding it to your existing df (multiplying by 1 + random for cases where you wish zeros to remain zero). With the uniform function you can determine the scale of your noise by altering the 0.01 variable.
来源:https://stackoverflow.com/questions/43787650/add-a-different-random-number-to-every-cell-in-a-pandas-dataframe