I\'m generating a Dask dataframe to be used downstream in a clustering algorithm supplied by dask-ml. In a previous step in my pipeline I read a dataframe from disk using th