How to transform Scala nested map operation to Scala Spark operation?

后端 未结 2 1801
情话喂你
情话喂你 2021-01-22 11:18

Below code calculates eucleudian distance between two List in a dataset :

 val user1 = List(\"a\", \"1\", \"3\", \"2\", \"6\", \"9\")  //> user1  : List[Stri         


        
2条回答
  •  走了就别回头了
    2021-01-22 11:53

    First of all I suggest you to move from storing you user model in list, to well typed class. And then I don't think you need to compute distance between the same users like (a-a) and (b-b), and no reason to compute distance twice (a-b) (b-a).

      val user1 = List("a", "1", "3", "2", "6", "9")
      val user2 = List("b", "1", "2", "2", "5", "9")
    
      case class User(name: String, features: Vector[Double])
    
      object User {
        def fromList(list: List[String]): User = list match {
          case h :: tail => User(h, tail.map(_.toDouble).toVector)
        }
      }
    
      def euclDistance(userA: User, userB: User) = {
        println(s"comparing ${userA.name} and ${userB.name}")
        val subElements = (userA.features zip userB.features) map {
          m => (m._1 - m._2) * (m._1 - m._2)
        }
        val summed = subElements.sum
        val sqRoot = Math.sqrt(summed)
    
        sqRoot
      }
    
      val all = List(User.fromList(user1), User.fromList(user2))
    
    
      val users: RDD[(User, User)] = sc.parallelize(all.combinations(2).toSeq.map {
        case l :: r :: Nil => (l, r)
      })
    
      users.foreach(t => euclDistance(t._1, t._2))
    

提交回复
热议问题