问题
How I can product two DStream in apache streaming like cartesian(RDD<U>)
which when called on datasets of types T and U, returns a dataset of (T, U) pairs (all pairs of elements).
One solution is using join as follow that doesn't seem good.
JavaPairDStream<Integer, String> xx = DStream_A.mapToPair(s -> {
return new Tuple2<>(1, s);
});
JavaPairDStream<Integer, String> yy = DStream_B.mapToPair(e -> {
return new Tuple2<>(1, e);
});
DStream_A_product_B = xx.join(yy);
Is there any better solution? or how i can use Cartesian method of RDD?
回答1:
I found the answer:
JavaPairDStream<String, String> cartes = DStream_A.transformWithToPair(DStream_B,
new Function3<JavaPairRDD<String, String>, JavaRDD<String>, Time, JavaPairRDD<String, String>>() {
@Override
public JavaPairRDD<String, String> call(JavaRDD<String> rddA, JavaRDD<String> rddB, Time v3) throws Exception {
JavaPairRDD<String, String> res = rddA.cartesian(rddB);
return res;
}
});
来源:https://stackoverflow.com/questions/37764207/cartesian-product-of-two-dstream-in-spark