I have to compute a cosine distance between each rows but I have no idea how to do it using Spark API Dataframes elegantly. The idea is to compute similarities for each rows
You can use mllib.feature.IndexedRowMatrix
's columnSimilarities
function. It uses cosine metrics as distance function. It computes similarities between columns so, you have to take transpose before applying this function.
pred_ = IndexedRowMatrix(Pred_Factors.rdd.map(lambda x: IndexedRow(x[0],x[1]))).toBlockMatrix().transpose().toIndexedRowMatrix()
pred_sims = pred.columnSimilarities()