In my program code I've got numpy
value arrays and numpy
index arrays. Both kinds are preallocated and predefined during program initialization.
Each part of the program has one array values
on which calculations are performed, and three index arrays idx_from_exch
, idx_values
and idx_to_exch
. There is on global value array to exchange the values of several parts: exch_arr
.
The index arrays most of the times have between 2 and 5 indices, seldom (most probably never) more indices are needed. dtype=np.int32
, shape
and values are constant during the whole program run. Thus I set ndarray.flags.writeable=False
after initialization, but this is optional. The index values of the index arrays idx_values
and idx_to_exch
are sorted in numerical order, idx_source
may be sorted, but there is no way to define that. All index arrays corresponding to one value array/part have the same shape
.
The values
arrays and also the exch_arr
usually have between 50 and 1000 elements. shape
and dtype=np.float64
are constant during the whole program run, the values of the arrays change in each iteration. The
Here are the example arrays:
import numpy as np
import numba as nb
values = np.random.rand(100) * 100 # just some random numbers
exch_arr = np.random.rand(60) * 3 # just some random numbers
idx_values = np.array((0, 4, 55, -1), dtype=np.int32) # sorted but varying steps
idx_to_exch = np.array((7, 8, 9, 10), dtype=np.int32) # sorted and constant steps!
idx_from_exch = np.array((19, 4, 7, 43), dtype=np.int32) # not sorted and varying steps
The example indexing operations look like this:
values[idx_values] = exch_arr[idx_from_exch] # get values from exchange array
values *= 1.1 # some inplace array operations, this is just a dummy for more complex things
exch_arr[idx_to_exch] = values[idx_values] # pass some values back to exchange array
Since these operations are being applied once per iteration for several million iterations, speed is crucial. I've been looking into many different ways of increasing indexing speed in my previous question, but forgot to be specific enough considering my application (especially getting values by indexing with constant index arrays and passing them to another indexed array).
The best way to do it seems to be fancy indexing so far. I'm currently also experimenting with numba
guvectorize
, but it seems that it is not worth the effort since my arrays are quite small.
memoryviews
would be nice, but since the index arrays do not necessarily have consistent steps, I know of no way to use memoryviews
.
So is there any faster way to do repeated indexing? Some way of predefining memory address arrays for each indexing operation, as dtype
and shape
are always constant? ndarray.__array_interface__
gave me a memory address, but I wasn't able to use it for indexing. I thought about something like:
stride_exch = exch_arr.strides[0]
mem_address = exch_arr.__array_interface__['data'][0]
idx_to_exch = idx_to_exch * stride_exch + mem_address
Is that feasible?
I've also been looking into using strides
directly with as_strided
, but as far as I know only consistent strides are allowed and my problem would require inconsistent strides
.
Any help is appreciated! Thanks in advance!
edit:
I just corrected a massive error in my example calculation!
The operation values = values * 1.1
changes the memory address of the array. All my operations in the program code are layed out to not change the memory address of the arrays, because alot of other operations rely on using memoryviews. Thus I replaced the dummy operation with the correct in-place operation: values *= 1.1
来源:https://stackoverflow.com/questions/46099352/increasing-performance-of-highly-repeated-numpy-array-index-operations