I have the following statement that is taking hours to execute on a large dataframe (billions of records). I read that groupby is expensive and needs to be avoided .Our spar