问题
I am working with Python in Bigquery and have a large dataframe df (circa 7m rows). I also have a list lst that holds some dates (say all days in a given month).
I am trying to create an additional column "random_day" in df with a random value from lst in each row.
I tried running a loop and apply function but being quite a large dataset it is proving challenging.
My attempts passed by the loop solution:
df["rand_day"] = ""
for i in a["row_nr"]:
rand_day = sample(day_list,1)[0]
df.loc[i,"rand_day"] = rand_day
And the apply solution, defining first my function and then calling it:
def random_day():
rand_day = sample(day_list,1)[0]
return day
df["rand_day"] = df.apply(lambda row: random_day())
Any tips on this? Thank you
回答1:
Use numpy.random.choice and if necessary convert dates by to_datetime:
df = pd.DataFrame({
'A':list('abcdef'),
'B':[4,5,4,5,5,4],
})
day_list = pd.to_datetime(['2015-01-02','2016-05-05','2015-08-09'])
#alternative
#day_list = pd.DatetimeIndex(['2015-01-02','2016-05-05','2015-08-09'])
df["rand_day"] = np.random.choice(day_list, size=len(df))
print (df)
A B rand_day
0 a 4 2016-05-05
1 b 5 2016-05-05
2 c 4 2015-08-09
3 d 5 2015-01-02
4 e 5 2015-08-09
5 f 4 2015-08-09
来源:https://stackoverflow.com/questions/54367361/how-to-assign-random-values-from-a-list-to-a-column-in-a-pandas-dataframe