I use the following function now, but I don\'t think it\'s efficient, but I can\'t understand the description of the cupy kernel definition.
def cupy_sum(self