I have a (large) length-N array of k distinct functions, and a length-N array of abcissa. I want to evaluate the functions at the abcissa to return a length-N array of ordin
This does almost the same thing as your (excellent!) self-answer, but with a bit less rigamarole. It seems marginally faster on my machine as well -- about 30ms based on a cursory test.
def apply_indexed_fast(array, func_indices, func_table):
func_argsort = func_indices.argsort()
func_ranges = list(np.searchsorted(func_indices[func_argsort], range(len(func_table))))
func_ranges.append(None)
out = np.zeros_like(array)
for f, start, end in zip(func_table, func_ranges, func_ranges[1:]):
ix = func_argsort[start:end]
out[ix] = f(array[ix])
return out
Like yours, this splits a sequence of argsort
indices into chunks, each of which corresponds to a function in func_table
. It then uses each chunk to select input and output indices for its corresponding function. To determine the chunk boundaries, it uses np.searchsorted
instead of np.unique
-- where searchsorted(a, b)
could be thought of as a binary search algorithm that returns the index of the first value in a
equal to or greater than the given value or values in b
.
Then the zip function simply iterates over its arguments in parallel, returning a single item from each one, collected together in a tuple, and stringing those together into a list. (So zip([1, 2, 3], ['a', 'b', 'c'], ['b', 'c', 'd'])
returns [(1, 'a', 'b'), (2, 'b', 'c'), (3, 'c', 'd')]
.) This, along with the for
statement's built-in ability to "unpack" those tuples, allows for a terse but expressive way to iterate over multiple sequences in parallel.
In this case, I've used it to iterate over the functions in func_tables
alongside two out-of-sync copies of func_ranges
. This ensures that the item from func_ranges
in the end
variable is always one step ahead of the item in the start
variable. By appending None
to func_ranges
, I ensure that the final chunk is handled gracefully -- zip
stops when any one of its arguments runs out of items, which cuts off the final value in the sequence. Conveniently, the None
value also serves as an open-ended slice index!
Another trick that does the same thing requires a few more lines, but has lower memory overhead, especially when used with the itertools
equivalent of zip
, izip:
range_iter_a = iter(func_ranges) # create generators that iterate over the
range_iter_b = iter(func_ranges) # values in `func_ranges` without making copies
next(range_iter_b, None) # advance the second generator by one
for f, start, end in itertools.izip(func_table, range_iter_a, range_iter_b):
...
However, these low-overhead generator-based approaches can sometimes be a bit slower than vanilla lists. Also, note that in Python 3, zip
behaves more like izip
.
That is a beautiful example of functional programming being somewhat emulated in Python.
Now, if you want to apply your function to a set of points, I'd recommend numpy
's ufunc
framework, which will allow you to create blazingly fast vectorized versions of your functions.
Thanks to hpaulj for the suggestion to pursue a groupby approach. There are lots of canned routines out there for this operation, such as Pandas DataFrames, but they all come with the overhead cost of the data structure initialization, which is one-time-only, but can be costly if using for just a single calculation.
Here is my pure numpy solution that is a factor of 13 faster than the original where loop I was using. The upshot summary is that I use np.argsort and np.unique together with some fancy indexing gymnastics.
First we sort the function indices, and then find the elements of the sorted array where each new index begins
idx_funcsort = np.argsort(function_indices)
unique_funcs, unique_func_indices = np.unique(function_indices[idx_funcsort], return_index=True)
Now there is no longer a need for blind lookups, since we know exactly which slice of the sorted array corresponds to each unique function. So we still loop over each called function, but without calling where:
for func_index in range(len(unique_funcs)-1):
idx_func = idx_funcsort[unique_func_indices[func_index]:unique_func_indices[func_index+1]]
func = func_table[unique_funcs[func_index]]
desired_output[idx_func] = func(abcissa_array[idx_func])
That covers all but the final index, which somewhat annoyingly we need to call individually due to Python indexing conventions:
func_index = len(unique_funcs)-1
idx_func = idx_funcsort[unique_func_indices[func_index]:]
func = func_table[unique_funcs[func_index]]
desired_output[idx_func] = func(abcissa_array[idx_func])
This gives identical results to the where loop (a bookkeeping sanity check), but the runtime of this loop is 0.027 seconds, a speedup of 13x over my original calculation.