I am trying to create a huge boolean
matrix which is randomly filled with True
and False
with a given probability p
. At f
Another possibility could be to generate it in a batch (i.e. compute many sub-arrays and stack them together at the very end). But, consider not to update one array (mask
) in a for
loop as OP is doing. This would force the whole array to load in main memory during every indexing update.
Instead for example: to get 30000x30000
, have 9000 100x100
separate arrays, update each of this 100x100
array accordingly in a for
loop and finally stack these 9000 arrays together in a giant array. This would definitely need not more than 4GB of RAM and would be very fast as well.
Minimal Example:
In [9]: a
Out[9]:
array([[0, 1],
[2, 3]])
In [10]: np.hstack([np.vstack([a]*5)]*5)
Out[10]:
array([[0, 1, 0, 1, 0, 1, 0, 1, 0, 1],
[2, 3, 2, 3, 2, 3, 2, 3, 2, 3],
[0, 1, 0, 1, 0, 1, 0, 1, 0, 1],
[2, 3, 2, 3, 2, 3, 2, 3, 2, 3],
[0, 1, 0, 1, 0, 1, 0, 1, 0, 1],
[2, 3, 2, 3, 2, 3, 2, 3, 2, 3],
[0, 1, 0, 1, 0, 1, 0, 1, 0, 1],
[2, 3, 2, 3, 2, 3, 2, 3, 2, 3],
[0, 1, 0, 1, 0, 1, 0, 1, 0, 1],
[2, 3, 2, 3, 2, 3, 2, 3, 2, 3]])
In [11]: np.hstack([np.vstack([a]*5)]*5).shape
Out[11]: (10, 10)