Create large random boolean matrix with numpy

前端 未结 3 1249
孤街浪徒
孤街浪徒 2021-01-03 20:06

I am trying to create a huge boolean matrix which is randomly filled with True and False with a given probability p. At f

3条回答
  •  清酒与你
    2021-01-03 21:03

    The problem is your RAM, the values are being stored in memory as it's being created. I just created this matrix using this command:

    np.random.choice(a=[False, True], size=(N, N), p=[p, 1-p])

    I used an AWS i3 instance with 64GB of RAM and 8 cores. To create this matrix, htop shows that it takes up ~20GB of RAM. Here is a benchmark in case you care:

    time np.random.choice(a=[False, True], size=(N, N), p=[p, 1-p])
    
    CPU times: user 18.3 s, sys: 3.4 s, total: 21.7 s
    Wall time: 21.7 s
    
    
     def mask_method(N, p):
        for i in range(N):
            mask[i] = np.random.choice(a=[False, True], size=N, p=[p, 1-p])
            if (i % 100 == 0):
                print(i)
    
    time mask_method(N,p)
    
    CPU times: user 20.9 s, sys: 1.55 s, total: 22.5 s
    Wall time: 22.5 s
    

    Note that the mask method only takes up ~9GB of RAM at it's peak.

    Edit: The first method flushes the RAM after the process is done where as the function method retains all of it.

提交回复
热议问题