spark: access rdd inside another rdd
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I have a lookup rdd of size 6000, lookup_rdd: RDD[String] a1 a2 a3 a4 a5 ..... and another rdd, data_rdd: RDD[(String, Iterable[(String, Int)])]: (id,(item,count)) which has unique ids, (id1,List((a1,2), (a3,4))) (id2,List((a2,1), (a4,2), (a1,1))) (id3,List((a5,1))) FOREACH element in lookup_rdd I want to check whether each id has that element or not, if it is there I put the count and if it's not I put 0, and store in a file. What is the efficient way to achieve this. Is hashing possible? eg. output I want is: id1,2,0,4,0,0 id2,1,1,0,2,0