PANDAS: int32 overflow? Can't bulid a pivot table

前端 未结 4 603
醉酒成梦
醉酒成梦 2021-01-19 09:50

I use the pd.pivot_table() method to create a user-item matrix by pivoting the user-item activity data. However, the dataframe is so large that I got compla

4条回答
  •  陌清茗
    陌清茗 (楼主)
    2021-01-19 09:57

    If you want movieId as your columns, first sort the dataframe using movieId as the key.

    Then divide (half) the dataframe such that each subset contains all the ratings for a particular movie.

    subset1 = df[:n] 
    subset2 = df[n:]
    

    Now, apply to each of the subsets

    matrix1 = subset1.pivot_table(values='rating', index='userId', columns='movieId')
    matrix2 = subset2.pivot_table(values='rating', index='userId', columns='movieId')
    

    Finally join matrix1 and matrix2 using,

    complete_matrix = matrix1.join(matrix2)
    

    On the other hand, if you want userId as your columns, sort the dataframe using userId as the key and repeat the above process.

    ***Please be sure to delete subset1, subset2, matrix1 & matrix2 after you're done or else you'll end up with Memory Error.

提交回复
热议问题