I have this huge dataset (100M rows) of consumer transactions that looks as follows:
df = pd.DataFrame({\'id\':[1, 1, 2, 2, 3],\'brand\':[\'a\',\'b\',\'a\',\'