问题
I would like to construct sequences of user's purchasing history using dictionaries in Python. I would like these sequences to be ordred by date.
I have 3 columns in my dataframe:
users items date
1 1 date_1
1 2 date_2
2 1 date_3
2 3 date_1
4 5 date_2
4 1 date_5
4 3 date_3
And the result should be like this :
{1: [[1,date_1],[2,date_2]], 2:[[3,date_1],[5,date_2],[1,date_3]], 4:[[5,date_2],[3,date_3][1,date_5]]}
My code is :
df_sub = df[['uid', 'nid', 'date']]
dic3 = df_sub.set_index('uid').T.to_dict('list')
And my results are :
{36864: [258509L, '2014-12-03'], 548873: [502105L, '2015-09-08'], 42327: [492268L, '2015-01-29'], 548873: [370049L, '2015-02-18'], 36864: [258909L, '2016-01-13'] ... }
But I would like to group by users :
{36864: [[258509L, '2014-12-03'],[258909L, '2016-01-13']], 548873: [[502105L, '2015-09-08'],[370049L, '2015-02-18']], 42327: [492268L, '2015-01-29'] }
Some help, please!
回答1:
Firstly, set users as the index and perform groupby
w.r.t that. Then, you could pass a function to sort each group by it's date column and extract it's underlying array part using .values.
Use .tolist to get back it's list
equivalent. This gives you in the required format. Finally, use .to_dict to get your final output as a dictionary.
fnc = lambda x: x.sort_values('date').values.tolist()
df.set_index('users').groupby(level=0).apply(fnc).to_dict()
produces:
{1: [[1, 'date_1'], [2, 'date_2']],
2: [[3, 'date_1'], [1, 'date_3']],
4: [[5, 'date_2'], [3, 'date_3'], [1, 'date_5']]}
来源:https://stackoverflow.com/questions/41330030/construct-sequences-from-a-dataframe-using-dictionaries-in-python