I have many large (>100,000,000) lists of integers that contain many duplicates. I want to get the indices where each of the element occur. Currently I am doing something li
this can be solved via python pandas (python data analysis library) and a DataFrame.groupby call.
DataFrame.groupby
Consider the following
a = np.array([1, 2, 6, 4, 2, 3, 2]) import pandas as pd df = pd.DataFrame({'a':a}) gg = df.groupby(by=df.a) gg.groups
output
{1: [0], 2: [1, 4, 6], 3: [5], 4: [3], 6: [2]}