How to read/traverse/slice Scipy sparse matrices (LIL, CSR, COO, DOK) faster?

后端未结

关注

 2  974

悲哀的现实 2021-01-06 18:44

To manipulate Scipy matrices, typically, the built-in methods are used. But sometimes you need to read the matrix data to assign it to non-sparse data types. For the sake of

2条回答

小蘑菇 (楼主)

2021-01-06 19:03
A similar question, but dealing setting sparse values, rather than just reading them:

Efficient incremental sparse matrix in python / scipy / numpy

More on accessing values using the underlying representation

Efficiently select random non-zero column from each row of sparse matrix in scipy

Also

why is row indexing of scipy csr matrices slower compared to numpy arrays

Why are lil_matrix and dok_matrix so slow compared to common dict of dicts?

Take a look at what M.nonzero does:
```
    A = self.tocoo()
    nz_mask = A.data != 0
    return (A.row[nz_mask],A.col[nz_mask])
```
It converts the matrix to coo format and returns the .row, and .col attributes - after filtering out any 'stray' 0s in the .data attribute.

So you could skip the middle man and use those attributes directly:
```
 A = lil.tocoo()
 for i,j,d in zip(A.row, A.col, A.data):
      a[i,j] = d
```
This is almost as good as the toarray:
```
In [595]: %%timeit
   .....: aa = M.tocoo()
   .....: for i,j,d in zip(aa.row,aa.col,aa.data):
   .....:   A[i,j]=d
   .....: 
100 loops, best of 3: 14.3 ms per loop

In [596]: timeit  arr=M.toarray()
100 loops, best of 3: 12.3 ms per loop
```
But if your target is really an array, you don't need to iterate
```
In [603]: %%timeit
   .....: A=np.empty(M.shape,M.dtype)
   .....: aa=M.tocoo()
   .....: A[aa.row,aa.col]=aa.data
   .....: 
100 loops, best of 3: 8.22 ms per loop
```
My times for @Thoran's 2 methods are:
```
100 loops, best of 3: 5.81 ms per loop
100 loops, best of 3: 17.9 ms per loop
```
Same ballpark of times.
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...