A Cartesian product function that can yield chunks of result for large arrays

问题

@Paul Panzer shared an excellent answer on how to perform the cartesian product of a list of NumPy arrays efficiently. I have modified his cartesian_product_transpose_pp(arrays) function to show the iteration process occurs from the left to right column of the returned array.

import numpy
import itertools
import time

def cartesian_product_transpose_pp(arrays):
    la = len(arrays)
    dtype = numpy.result_type(*arrays)
    arr = numpy.empty((la, *map(len, arrays)), dtype=dtype)
    idx = slice(None), *itertools.repeat(None, la)
    for i, a in enumerate(arrays):
        arr[i, ...] = a[idx[:i]] #my modification   
    return arr.reshape(la, -1).T

mumax = 18
mumin = 1
nsample = 8
mu_list = [ i for i in range(mumin, mumax+1, 1) ]
mu_array = np.array( mu_list, dtype=np.uint8 )
mu_alist = [ mu_array ] * nsample 

start = time.time()
cartesian_product_transpose_pp( mu_alist  )
end = time.time()
print( f'\ncartesian_product_transpose_pp Time: {(end - start)}sec' )

However, when this function's argument( i.e. arrays) exceeds a certain size, it will require a very large arr and fail due to MemoryError. Example:

arr = np.empty( ( la, *map(len, arrays) ), dtype=dtype )
MemoryError: Unable to allocate 82.1 GiB for an array with shape (8, 18, 18, 18, 18, 18, 18, 18, 18) and data type uint8

To address this memory error, I would like to break arr into smaller chunks so as to be able to yield smaller chunks of arr.reshape(la, -1).T How do I do this when the value of nsample increases?

来源：https://stackoverflow.com/questions/62603715/a-cartesian-product-function-that-can-yield-chunks-of-result-for-large-arrays

标签

python

arrays

numpy

out-of-memory