Say, I have the following dataframe:
df = pd.DataFrame({\'a\':[\'a\',\'b\',\'c (not a)\', \'this is (random)\']*10000})
I want to produce the