PANDAS: int32 overflow? Can't bulid a pivot table

删除回忆录丶 提交于 2020-05-15 05:49:10

问题


I use the pd.pivot_table() method to create a user-item matrix by pivoting the user-item activity data. However, the dataframe is so large that I got complain like this:

Unstacked DataFrame is too big, causing int32 overflow

Any suggestions on solving this problem? Thanks!

r_matrix = df.pivot_table(values='rating', index='userId', columns='movieId')

回答1:


An integer overflow inside library code is nothing you can do much about. You have basically three options:

  1. Change the input data you provide to the library so the overflow does not occur. You probably need to make the input smaller in some sense. If that does not help, you may be using the library in a wrong way or hit a bug in the library.
  2. Use a different library (or none at all); it seems that the library you are using is not intended to operate on large input.
  3. Modify the code of the library itself so it can handle your input. This may be hard to do, but if you submit a pull request to the library source code, many people will profit from it.

You don't provide much code, so I cannot tell what is the best solution for you.




回答2:


Some Solutions:

  • You can downgrade your pandas version to 0.21 which is no problem with pivot table with big size datas.
  • You can set your data to dictionary format like df.groupby('EVENT_ID')['DIAGNOSIS'].apply(list).to_dict()



回答3:


You can use groupby instead. Try this code:

reviews.groupby(['userId','movieId'])['rating'].max().unstack()


来源:https://stackoverflow.com/questions/56790261/pandas-int32-overflow-cant-bulid-a-pivot-table

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!