I have two pyspark data frames.
One is pulled from a SQL database and contains a unit number for each day in 5-minute intervals, and contains millions of rows.
The