Cartesian product of two DStream in Spark

笑着哭i 提交于 2019-12-25 07:27:16

问题


How I can product two DStream in apache streaming like cartesian(RDD<U>) which when called on datasets of types T and U, returns a dataset of (T, U) pairs (all pairs of elements).

One solution is using join as follow that doesn't seem good.

    JavaPairDStream<Integer, String> xx = DStream_A.mapToPair(s -> {
        return new Tuple2<>(1, s);
    });

    JavaPairDStream<Integer, String> yy = DStream_B.mapToPair(e -> {
        return new Tuple2<>(1, e);
    });

    DStream_A_product_B = xx.join(yy);

Is there any better solution? or how i can use Cartesian method of RDD?


回答1:


I found the answer:

JavaPairDStream<String, String> cartes = DStream_A.transformWithToPair(DStream_B, 
     new Function3<JavaPairRDD<String, String>, JavaRDD<String>, Time, JavaPairRDD<String, String>>() {
        @Override
        public JavaPairRDD<String, String> call(JavaRDD<String> rddA, JavaRDD<String> rddB, Time v3) throws Exception {
            JavaPairRDD<String, String> res = rddA.cartesian(rddB);
            return res;
        }
    });


来源:https://stackoverflow.com/questions/37764207/cartesian-product-of-two-dstream-in-spark

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!