I have many large (>100,000,000) lists of integers that contain many duplicates. I want to get the indices where each of the element occur. Currently I am doing something li
The numpy_indexed package (disclaimer: I am its author) implements a solution inspired by Jaime's; but with tests, a nice interface, and a lot of related functionality:
import numpy_indexed as npi
unique, idx_groups = npi.group_by(a, np.arange(len(a))