Very large matrices using Python and NumPy

后端未结

关注

 11  1815

NumPy is an extremely useful library, and from using it I\'ve found that it\'s capable of handling matrices which are quite large (10000 x 10000) easily, but begins to strug

相关标签:

11条回答

自闭症患者

2020-11-22 14:16

Usually when we deal with large matrices we implement them as Sparse Matrices.

I don't know if numpy supports sparse matrices but I found this instead.

0 讨论(0)
发布评论:

提交评论
- 加载中...
有刺的猬

2020-11-22 14:22

It's a bit alpha, but http://blaze.pydata.org/ seems to be working on solving this.

0 讨论(0)
发布评论:

提交评论
- 加载中...
醉酒成梦

2020-11-22 14:24

To handle sparse matrices, you need the scipy package that sits on top of numpy -- see here for more details about the sparse-matrix options that scipy gives you.

0 讨论(0)
发布评论:

提交评论
- 加载中...
暖寄归人

2020-11-22 14:28
PyTables and NumPy are the way to go.

PyTables will store the data on disk in HDF format, with optional compression. My datasets often get 10x compression, which is handy when dealing with tens or hundreds of millions of rows. It's also very fast; my 5 year old laptop can crunch through data doing SQL-like GROUP BY aggregation at 1,000,000 rows/second. Not bad for a Python-based solution!

Accessing the data as a NumPy recarray again is as simple as:
```
data = table[row_from:row_to]
```
The HDF library takes care of reading in the relevant chunks of data and converting to NumPy.
0 讨论(0)
发布评论:

提交评论
- 加载中...
无人及你

2020-11-22 14:30
Sometimes one simple solution is using a custom type for your matrix items. Based on the range of numbers you need, you can use a manual dtype and specially smaller for your items. Because Numpy considers the largest type for object by default this might be a helpful idea in many cases. Here is an example:
```
In [70]: a = np.arange(5)

In [71]: a[0].dtype
Out[71]: dtype('int64')

In [72]: a.nbytes
Out[72]: 40

In [73]: a = np.arange(0, 2, 0.5)

In [74]: a[0].dtype
Out[74]: dtype('float64')

In [75]: a.nbytes
Out[75]: 32
```
And with custom type:
```
In [80]: a = np.arange(5, dtype=np.int8)

In [81]: a.nbytes
Out[81]: 5

In [76]: a = np.arange(0, 2, 0.5, dtype=np.float16)

In [78]: a.nbytes
Out[78]: 8
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
花落未央

2020-11-22 14:30

Are you asking how to handle a 2,500,000,000 element matrix without terabytes of RAM?

The way to handle 2 billion items without 8 billion bytes of RAM is by not keeping the matrix in memory.

That means much more sophisticated algorithms to fetch it from the file system in pieces.

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页