Have a data in such format in .txt file:
UserId WordID
1 20
1 30
1 40
2 25
2 16
3 56
3 44
3 12
I think you can use groupby with apply tolist
with values:
print df.groupby('UserId')['WordID'].apply(lambda x: x.tolist()).values
[[20, 30, 40] [25, 16] [56, 44, 12]]
Or apply list
, thank you B.M.
print df.groupby('UserId')['WordID'].apply(list).values
[[20, 30, 40] [25, 16] [56, 44, 12]]
Timings:
df = pd.concat([df]*1000).reset_index(drop=True)
In [358]: %timeit df.groupby('UserId')['WordID'].apply(list).values
1000 loops, best of 3: 1.22 ms per loop
In [359]: %timeit df.groupby('UserId')['WordID'].apply(lambda x: x.tolist()).values
1000 loops, best of 3: 1.23 ms per loop