Using LSH in spark to run nearest neighbors query on every point in dataframe

问题

I need k nearest neighbors for each feature vector in the dataframe. I'm using BucketedRandomProjectionLSHModel from pyspark.

code for creating the model

brp = BucketedRandomProjectionLSH(inputCol="features", outputCol="hashes",seed=12345, bucketLength=n)

model = brp.fit(data_df)
df_lsh = model.transform(data_df)

Now, How do I run approx nearest neighbor query for each point in data_df.

I have tried broadcasting the model but got pickle error. Also, defining a udf to access the model gives error Method __getstate__([]) does not exist