I have recently begun looking at Dask for big data. I have a question on efficiently applying operations in parallel.
Say I have some sales data like this:
cu
Setting index to the required column and map_partitions works much efficient compared to groupby