问题
I have a large sparse scipy matrix (~40k by 100k). I would like to sort each row, descending order, and grab/slice the top K values (~20-50) for every row. I would also like to know the original column index, as each column in the matrix represents a word/feature (in my case, I am running scikit to get tfidf values).
40k rows by K values, wont be as large, and then I can do operations such as .toarray()
, but I am not sure what would be the most efficient way of doing the argsort(axis=1)
for every row, grabbing the values, along with indices, and storing this new array. One idea I have is using key:value pairs for each element.
With this new array, I would want to do operations such as, printing the top matrix element values with which row (which corresponds to document) it is from and original index. Or simply printing out, row by row, all the top values with each values' corresponding index.
Thank you for the help in advance
PS It is somewhat related to a question I asked here, about summing each column of a large matrix, then sorting the sums, and grabbing the top K values, along with the column indices.
来源:https://stackoverflow.com/questions/20297071/sorting-each-row-of-a-large-sparse-saving-top-k-values-column-index