convert itertools array into numpy array

前端 未结 2 1935
无人及你
无人及你 2021-01-18 16:44

I\'m creating this array:

A=itertools.combinations(range(6),2)

and I have to manipulate this array with numpy, like:

A.resh         


        
相关标签:
2条回答
  • 2021-01-18 17:20

    I'm reopening this because I dislike the linked answer. The accepted answer suggests using

    np.array(list(A))  # producing a (15,2) array
    

    But the OP aparently has already tried list(A), and found it to be slow.

    Another answer suggests using np.fromiter. But buried in its comments is the note that fromiter requires a 1d array.

    In [102]: A=itertools.combinations(range(6),2)
    In [103]: np.fromiter(A,dtype=int)
    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    <ipython-input-103-29db40e69c08> in <module>()
    ----> 1 np.fromiter(A,dtype=int)
    
    ValueError: setting an array element with a sequence.
    

    So using fromiter with this itertools requires somehow flattening the iterator.

    A quick set of timings suggests that list isn't the slow step. It's converting the list to an array that is slow:

    In [104]: timeit itertools.combinations(range(6),2)
    1000000 loops, best of 3: 1.1 µs per loop
    In [105]: timeit list(itertools.combinations(range(6),2))
    100000 loops, best of 3: 3.1 µs per loop
    In [106]: timeit np.array(list(itertools.combinations(range(6),2)))
    100000 loops, best of 3: 14.7 µs per loop
    

    I think the fastest way to use fromiter is to flatten the combinations with an idiomatic use of itertools.chain:

    In [112]: timeit
    np.fromiter(itertools.chain(*itertools.combinations(range(6),2)),dtype=int)
       .reshape(-1,2)
    100000 loops, best of 3: 12.1 µs per loop
    

    Not much of a time savings, at least on this small size. (fromiter also takes a count, which shaves off another µs. With a larger case, range(60), the fromiter takes half the time of array.


    A quick search on [numpy] itertools turns up a number of suggestions of pure numpy ways of generating all combinations. itertools is fast, for generating pure Python structures, but converting those to arrays is a slow step.


    A picky point about the question.

    A is a generator, not an array. list(A) does produce a nested list, that can be described loosely as an array. But it isn't a np.array, and does not have a reshape method.

    0 讨论(0)
  • 2021-01-18 17:23

    An alternative way to get every pairwise combination of N elements is to generate the indices of the upper triangle of an (N, N) matrix using np.triu_indices(N, k=1), e.g.:

    np.vstack(np.triu_indices(6, k=1)).T
    

    For small arrays, itertools.combinations is going to win, but for large N the triu_indices trick can be substantially quicker:

    In [1]: %timeit np.fromiter(itertools.chain.from_iterable(itertools.combinations(range(6), 2)), np.int)
    The slowest run took 10.46 times longer than the fastest. This could mean that an intermediate result is being cached 
    100000 loops, best of 3: 4.04 µs per loop
    
    In [2]: %timeit np.array(np.triu_indices(6, 1)).T
    The slowest run took 10.97 times longer than the fastest. This could mean that an intermediate result is being cached 
    10000 loops, best of 3: 22.3 µs per loop
    
    In [3]: %timeit np.fromiter(itertools.chain.from_iterable(itertools.combinations(range(1000), 2)), np.int)
    10 loops, best of 3: 69.7 ms per loop
    
    In [4]: %timeit np.array(np.triu_indices(1000, 1)).T
    100 loops, best of 3: 10.6 ms per loop
    
    0 讨论(0)
提交回复
热议问题