问题
@Paul Panzer shared an excellent answer on how to perform the cartesian product of a list of NumPy arrays efficiently. I have modified his cartesian_product_transpose_pp(arrays)
function to show the iteration process occurs from the left to right column of the returned array.
import numpy
import itertools
import time
def cartesian_product_transpose_pp(arrays):
la = len(arrays)
dtype = numpy.result_type(*arrays)
arr = numpy.empty((la, *map(len, arrays)), dtype=dtype)
idx = slice(None), *itertools.repeat(None, la)
for i, a in enumerate(arrays):
arr[i, ...] = a[idx[:i]] #my modification
return arr.reshape(la, -1).T
mumax = 18
mumin = 1
nsample = 8
mu_list = [ i for i in range(mumin, mumax+1, 1) ]
mu_array = np.array( mu_list, dtype=np.uint8 )
mu_alist = [ mu_array ] * nsample
start = time.time()
cartesian_product_transpose_pp( mu_alist )
end = time.time()
print( f'\ncartesian_product_transpose_pp Time: {(end - start)}sec' )
However, when this function's argument( i.e. arrays
) exceeds a certain size, it will require a very large arr
and fail due to MemoryError
. Example:
arr = np.empty( ( la, *map(len, arrays) ), dtype=dtype )
MemoryError: Unable to allocate 82.1 GiB for an array with shape (8, 18, 18, 18, 18, 18, 18, 18, 18) and data type uint8
To address this memory error, I would like to break arr
into smaller chunks so as to be able to yield smaller chunks of arr.reshape(la, -1).T
How do I do this when the value of nsample
increases?
来源:https://stackoverflow.com/questions/62603715/a-cartesian-product-function-that-can-yield-chunks-of-result-for-large-arrays