“Large data” work flows using pandas

前端未结

关注

 16  1786

被撕碎了的回忆 2020-11-21 07:32

I have tried to puzzle out an answer to this question for many months while learning pandas. I use SAS for my day-to-day work and it is great for it\'s out-of-core support.

16条回答

南方客 (楼主)

2020-11-21 08:03
One trick I found helpful for large data use cases is to reduce the volume of the data by reducing float precision to 32-bit. It's not applicable in all cases, but in many applications 64-bit precision is overkill and the 2x memory savings are worth it. To make an obvious point even more obvious:
```
>>> df = pd.DataFrame(np.random.randn(int(1e8), 5))
>>> df.info()

RangeIndex: 100000000 entries, 0 to 99999999
Data columns (total 5 columns):
...
dtypes: float64(5)
memory usage: 3.7 GB

>>> df.astype(np.float32).info()

RangeIndex: 100000000 entries, 0 to 99999999
Data columns (total 5 columns):
...
dtypes: float32(5)
memory usage: 1.9 GB
```
0 讨论(0)

查看其它16个回答
发布评论:

提交评论
- 加载中...