sorting each row of a large sparse & saving top K values & column index

问题

I have a large sparse scipy matrix (~40k by 100k). I would like to sort each row, descending order, and grab/slice the top K values (~20-50) for every row. I would also like to know the original column index, as each column in the matrix represents a word/feature (in my case, I am running scikit to get tfidf values).

40k rows by K values, wont be as large, and then I can do operations such as .toarray(), but I am not sure what would be the most efficient way of doing the argsort(axis=1) for every row, grabbing the values, along with indices, and storing this new array. One idea I have is using key:value pairs for each element.

With this new array, I would want to do operations such as, printing the top matrix element values with which row (which corresponds to document) it is from and original index. Or simply printing out, row by row, all the top values with each values' corresponding index.

Thank you for the help in advance

PS It is somewhat related to a question I asked here, about summing each column of a large matrix, then sorting the sums, and grabbing the top K values, along with the column indices.

来源：https://stackoverflow.com/questions/20297071/sorting-each-row-of-a-large-sparse-saving-top-k-values-column-index

标签

scipy

tf-idf

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!