Let\'s say I have N data points on m different machines (distributed) and N is in the order of millions and I want to get a K sample of the data point in a distributed fashion.