Cartesian of DStream

拈花ヽ惹草 提交于 2019-12-12 05:37:42

问题


I use Spark cartesian function to to generate a list N pairs of values.

I then map over these values to generate a distance metric between each of the users :

val cartesianUsers: org.apache.spark.rdd.RDD[(distance.classes.User, distance.classes.User)] = users.cartesian(users)
cartesianUsers.map(m => manDistance(m._1, m._2))

This works as expected.

Using Spark Streaming library I create a DStream and then map over it :

val customReceiverStream: ReceiverInputDStream[String] = ssc.receiverStream....
customReceiverStream.foreachRDD(m => {
  println("size is " + m)
})

I could use cartesian function within customReceiverStream.foreachRDD but according to doc http://spark.apache.org/docs/1.2.0/streaming-programming-guide.htm this is not its intended use :

foreachRDD(func) The most generic output operator that applies a function, func, to each RDD generated from the stream. This function should push the data in each RDD to a external system, like saving the RDD to files, or writing it over the network to a database. Note that the function func is executed in the driver process running the streaming application, and will usually have RDD actions in it that will force the computation of the streaming RDDs.

How to compute the cartesian of a DStream ? Perhaps I'm misunderstanding the use of DStreams ?


回答1:


I wasn't aware of transform method :

cartesianUsers.transform(car => car.cartesian(car))

Nice talk which also mentions transform function at approx 17:00 https://www.youtube.com/watch?v=g171ndOHgJ0



来源:https://stackoverflow.com/questions/29034825/cartesian-of-dstream

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!