I am trying to create a huge boolean
matrix which is randomly filled with True
and False
with a given probability p
. At f
The problem is your RAM, the values are being stored in memory as it's being created. I just created this matrix using this command:
np.random.choice(a=[False, True], size=(N, N), p=[p, 1-p])
I used an AWS i3
instance with 64GB of RAM and 8 cores. To create this matrix, htop
shows that it takes up ~20GB of RAM. Here is a benchmark in case you care:
time np.random.choice(a=[False, True], size=(N, N), p=[p, 1-p])
CPU times: user 18.3 s, sys: 3.4 s, total: 21.7 s
Wall time: 21.7 s
def mask_method(N, p):
for i in range(N):
mask[i] = np.random.choice(a=[False, True], size=N, p=[p, 1-p])
if (i % 100 == 0):
print(i)
time mask_method(N,p)
CPU times: user 20.9 s, sys: 1.55 s, total: 22.5 s
Wall time: 22.5 s
Note that the mask method only takes up ~9GB of RAM at it's peak.
Edit: The first method flushes the RAM after the process is done where as the function method retains all of it.