问题
I have a very long array* of length L
(let's call it values
) that I want to sum over, and a sorted 1D array of the same length L
that contains N
integers with which to partition the original array – let's call this array labels
.
What I'm currently doing is this (module
being cupy
or numpy
):
result = module.empty(N)
for i in range(N):
result[i] = values[labels == i].sum()
But this can't be the most efficient way of doing it (it should be possible to get rid of the for
loop, but how?). Since labels
is sorted, I could easily determine the break points and use those indices as start/stop points, but I don't see how this solves the for
loop problem.
Note that I would like to avoid creating an array of size N
xL
along the way, if possible, since L
is very large.
I'm working in cupy, but any numpy solution is welcome too and could probably be ported. Within cupy, it seems this would be a case for a ReductionKernel
, but I don't quite see how to do it.
* in my case, values
is 1D, but I assume the solution wouldn't depend on this
回答1:
You are describing a groupby sum aggregation. You could write a CuPy RawKernel
for this, but it would be much easier to use the existing groupby aggregations implemented in cuDF, the GPU dataframe library. They can interoperate without requiring you to copy the data. If you call .values
on the resulting cuDF Series, it will give you a CuPy array.
If you went back to the CPU, you could do the same thing with pandas.
import cupy as cp
import pandas as pd
N = 100
values = cp.random.randint(0, N, 1000)
labels = cp.sort(cp.random.randint(0, N, 1000))
L = len(values)
result = cp.empty(L)
for i in range(N):
result[i] = values[labels == i].sum()
result[:5]
array([547., 454., 402., 601., 668.])
import cudf
df = cudf.DataFrame({"values": values, "labels": labels})
df.groupby(["labels"])["values"].sum().values[:5]
array([547, 454, 402, 601, 668])
回答2:
Here is a solution which, instead of a N x L
array, uses a N x <max partition size in labels>
array (which should not be large, if the disparity between different partitions is not too high):
- Resize the array into a 2-D array with partitions in each row. Since the length of the row equals the size of the maximum partition, fill unavailable values with zeros (since it doesn't affect any sum). This uses @Divakar's solution given here.
def jagged_to_regular(a, parts):
lens = np.ediff1d(parts,to_begin=parts[0])
mask = lens[:,None]>np.arange(lens.max())
out = np.zeros(mask.shape, dtype=a.dtype)
out[mask] = a
return out
parts_stack = jagged_to_regular(values, labels)
- Sum along axis 1:
result = np.sum(parts_stack, axis = 1)
In case you'd like a CuPy implementation, there's no direct CuPy alternative to numpy.ediff1d
in jagged_to_regular
. In that case, you can substitute the statement with numpy.diff
like so:
lens = np.insert(np.diff(parts), 0, parts[0])
and then continue to use CuPy as a drop-in replacement for numpy.
来源:https://stackoverflow.com/questions/62836599/calculate-partitioned-sum-efficiently-with-cupy-or-numpy