A Cartesian product function that can yield chunks of result for large arrays

南笙酒味 提交于 2020-06-29 03:49:17

问题


@Paul Panzer shared an excellent answer on how to perform the cartesian product of a list of NumPy arrays efficiently. I have modified his cartesian_product_transpose_pp(arrays) function to show the iteration process occurs from the left to right column of the returned array.

import numpy
import itertools
import time

def cartesian_product_transpose_pp(arrays):
    la = len(arrays)
    dtype = numpy.result_type(*arrays)
    arr = numpy.empty((la, *map(len, arrays)), dtype=dtype)
    idx = slice(None), *itertools.repeat(None, la)
    for i, a in enumerate(arrays):
        arr[i, ...] = a[idx[:i]] #my modification   
    return arr.reshape(la, -1).T

mumax = 18
mumin = 1
nsample = 8
mu_list = [ i for i in range(mumin, mumax+1, 1) ]
mu_array = np.array( mu_list, dtype=np.uint8 )
mu_alist = [ mu_array ] * nsample 

start = time.time()
cartesian_product_transpose_pp( mu_alist  )
end = time.time()
print( f'\ncartesian_product_transpose_pp Time: {(end - start)}sec' )

However, when this function's argument( i.e. arrays) exceeds a certain size, it will require a very large arr and fail due to MemoryError. Example:

arr = np.empty( ( la, *map(len, arrays) ), dtype=dtype )
MemoryError: Unable to allocate 82.1 GiB for an array with shape (8, 18, 18, 18, 18, 18, 18, 18, 18) and data type uint8

To address this memory error, I would like to break arr into smaller chunks so as to be able to yield smaller chunks of arr.reshape(la, -1).T How do I do this when the value of nsample increases?

来源:https://stackoverflow.com/questions/62603715/a-cartesian-product-function-that-can-yield-chunks-of-result-for-large-arrays

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!