I have a clustering algorithm in Python that I am trying to convert to PySpark (for parallel processing).
I have a dataset that contains regions, and stores within those