I have 1 large dataframe that is created by importing a csv file (sparkscv). This dataframe has many rows of daily data. The data is identified by date, region, service_of