Spark cosine distance between rows using Dataframe

前端 未结 1 648
悲&欢浪女
悲&欢浪女 2021-01-03 05:05

I have to compute a cosine distance between each rows but I have no idea how to do it using Spark API Dataframes elegantly. The idea is to compute similarities for each rows

相关标签:
1条回答
  • 2021-01-03 05:31

    You can use mllib.feature.IndexedRowMatrix's columnSimilarities function. It uses cosine metrics as distance function. It computes similarities between columns so, you have to take transpose before applying this function.

    pred_ = IndexedRowMatrix(Pred_Factors.rdd.map(lambda x: IndexedRow(x[0],x[1]))).toBlockMatrix().transpose().toIndexedRowMatrix()
    pred_sims = pred.columnSimilarities()
    
    0 讨论(0)
提交回复
热议问题