Does the np.nan in numpy array occupy memory?

前端未结

关注

 3  1387

I have a huge file of csv which can not be loaded into memory. Transforming it to libsvm format may save some memory. There are many nan in csv file. If I read lines and store

相关标签:

3条回答

盖世英雄少女心

2021-01-21 01:25
According to the getsizeof() command from the sys module it does. A simple and fast example :
```
import sys
import numpy as np 

x = np.array([1,2,3])
y = np.array([1,np.nan,3])

x_size = sys.getsizeof(x)
y_size = sys.getsizeof(y)
print(x_size)
print(y_size)
print(y_size == x_size) 
```
This should print out
```
 120
 120 
 True 
```
so my conclusion was it uses as much memory as a normal entry.

Instead you could use sparse matrices (Scipy.sparse) which do not save zero / Null at all and therefore are more memory efficient. But Scipy strongly discourages from using Numpy methods directly https://docs.scipy.org/doc/scipy/reference/sparse.html since Numpy might not interpret them correctly.
0 讨论(0)
发布评论:

提交评论
- 加载中...
遥遥无期

2021-01-21 01:35

When working with floating point representations of numbers, non-numeric values (NaN and inf) are also represented by a specific binary pattern occupying the same number of bits as any numeric floating point value. Therefore, NaNs occupy the same amount of memory as any other number in the array.

0 讨论(0)
发布评论:

提交评论
- 加载中...
醉话见心

2021-01-21 01:40

As far as I know yes, nan and zero values occupy the same memory as any other value, however, you can address your problem in other ways:

Have you tried using a sparse vector? they are intended for vectors with a lot of 0 values and memory consumption is optimized

SVM Module Scipy

Sparse matrices Scipy

There you have some info about SVM and sparse matrices, if you have further questions just ask.

Edited to provide an answer as well as a solution

0 讨论(0)
发布评论:

提交评论
- 加载中...