I use the pd.pivot_table() method to create a user-item matrix by pivoting the user-item activity data. However, the dataframe is so large that I got compla
Some Solutions:
df.groupby('EVENT_ID')['DIAGNOSIS'].apply(list).to_dict()
If you want movieId as your columns, first sort the dataframe using movieId as the key.
Then divide (half) the dataframe such that each subset contains all the ratings for a particular movie.
subset1 = df[:n]
subset2 = df[n:]
Now, apply to each of the subsets
matrix1 = subset1.pivot_table(values='rating', index='userId', columns='movieId')
matrix2 = subset2.pivot_table(values='rating', index='userId', columns='movieId')
Finally join matrix1 and matrix2 using,
complete_matrix = matrix1.join(matrix2)
On the other hand, if you want userId as your columns, sort the dataframe using userId as the key and repeat the above process.
***Please be sure to delete subset1, subset2, matrix1 & matrix2 after you're done or else you'll end up with Memory Error.
You can use groupby
instead. Try this code:
reviews.groupby(['userId','movieId'])['rating'].max().unstack()
An integer overflow inside library code is nothing you can do much about. You have basically three options:
You don't provide much code, so I cannot tell what is the best solution for you.