I\'m creating this array:
A=itertools.combinations(range(6),2)
and I have to manipulate this array with numpy, like:
A.resh
I'm reopening this because I dislike the linked answer. The accepted answer suggests using
np.array(list(A)) # producing a (15,2) array
But the OP aparently has already tried list(A)
, and found it to be slow.
Another answer suggests using np.fromiter
. But buried in its comments is the note that fromiter
requires a 1d array.
In [102]: A=itertools.combinations(range(6),2)
In [103]: np.fromiter(A,dtype=int)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-103-29db40e69c08> in <module>()
----> 1 np.fromiter(A,dtype=int)
ValueError: setting an array element with a sequence.
So using fromiter
with this itertools requires somehow flattening the iterator.
A quick set of timings suggests that list
isn't the slow step. It's converting the list to an array that is slow:
In [104]: timeit itertools.combinations(range(6),2)
1000000 loops, best of 3: 1.1 µs per loop
In [105]: timeit list(itertools.combinations(range(6),2))
100000 loops, best of 3: 3.1 µs per loop
In [106]: timeit np.array(list(itertools.combinations(range(6),2)))
100000 loops, best of 3: 14.7 µs per loop
I think the fastest way to use fromiter
is to flatten the combinations
with an idiomatic use of itertools.chain
:
In [112]: timeit
np.fromiter(itertools.chain(*itertools.combinations(range(6),2)),dtype=int)
.reshape(-1,2)
100000 loops, best of 3: 12.1 µs per loop
Not much of a time savings, at least on this small size. (fromiter
also takes a count
, which shaves off another µs. With a larger case, range(60)
, the fromiter
takes half the time of array
.
A quick search on [numpy] itertools
turns up a number of suggestions of pure numpy ways of generating all combinations. itertools
is fast, for generating pure Python structures, but converting those to arrays is a slow step.
A picky point about the question.
A
is a generator, not an array. list(A)
does produce a nested list, that can be described loosely as an array. But it isn't a np.array
, and does not have a reshape
method.
An alternative way to get every pairwise combination of N
elements is to generate the indices of the upper triangle of an (N, N)
matrix using np.triu_indices(N, k=1)
, e.g.:
np.vstack(np.triu_indices(6, k=1)).T
For small arrays, itertools.combinations
is going to win, but for large N the triu_indices
trick can be substantially quicker:
In [1]: %timeit np.fromiter(itertools.chain.from_iterable(itertools.combinations(range(6), 2)), np.int)
The slowest run took 10.46 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 4.04 µs per loop
In [2]: %timeit np.array(np.triu_indices(6, 1)).T
The slowest run took 10.97 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 22.3 µs per loop
In [3]: %timeit np.fromiter(itertools.chain.from_iterable(itertools.combinations(range(1000), 2)), np.int)
10 loops, best of 3: 69.7 ms per loop
In [4]: %timeit np.array(np.triu_indices(1000, 1)).T
100 loops, best of 3: 10.6 ms per loop