I am processing data with spark and trying to get a distinct count on one of the dataframes. The data is read in from parquet files hosted in AWS S3.
I read in all th