Null value in spark streaming from Kafka

≯℡__Kan透↙ 提交于 2019-12-12 03:28:01

问题


I have a simple program because I'm trying to receive data using kafka. When I start a kafka producer and I send data, for example: "Hello", I get this when I print the message: (null, Hello). And I don't know why this null appears. Is there any way to avoid this null? I think it's due to Tuple2<String, String>, the first parameter, but I only want to print the second parameter. And another thing, when I print that using System.out.println("inside map "+ message); it does not appear any message, does someone know why? Thanks.

public static void main(String[] args){

    SparkConf sparkConf = new SparkConf().setAppName("org.kakfa.spark.ConsumerData").setMaster("local[4]");
    // Substitute 127.0.0.1 with the actual address of your Spark Master (or use "local" to run in local mode
    sparkConf.set("spark.cassandra.connection.host", "127.0.0.1");
    // Create the context with 2 seconds batch size
    JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, new Duration(2000));

    Map<String, Integer> topicMap = new HashMap<>();
    String[] topics = KafkaProperties.TOPIC.split(",");
    for (String topic: topics) {
        topicMap.put(topic, KafkaProperties.NUM_THREADS);
    }
    /* connection to cassandra */
    CassandraConnector connector = CassandraConnector.apply(sparkConf);
    System.out.println("+++++++++++ cassandra connector created ++++++++++++++++++++++++++++");

    /* Receive kafka inputs */
    JavaPairReceiverInputDStream<String, String> messages =
            KafkaUtils.createStream(jssc, KafkaProperties.ZOOKEEPER, KafkaProperties.GROUP_CONSUMER, topicMap);
    System.out.println("+++++++++++++ streaming-kafka connection done +++++++++++++++++++++++++++");

    JavaDStream<String> lines = messages.map(
            new Function<Tuple2<String, String>, String>() {
                public String call(Tuple2<String, String> message) {
                    System.out.println("inside map "+ message);
                    return message._2();
                }
            }
    );

    messages.print();
    jssc.start();
    jssc.awaitTermination();
}

回答1:


Q1) Null values: Messages in Kafka are Keyed, that means they all have a (Key, Value) structure. When you see (null, Hello) is because the producer published a (null,"Hello") value in a topic. If you want to omit the key in your process, map the original Dtream to remove the key: kafkaDStream.map( new Function<String,String>() {...})

Q2) System.out.println("inside map "+ message); does not print. A couple of classical reasons:

  1. Transformations are applied in the executors, so when running in a cluster, that output will appear in the executors and not on the master.

  2. Operations are lazy and DStreams need to be materialized for operations to be applied.

In this specific case, the JavaDStream<String> lines is never materialized i.e. not used for an output operation. Therefore the map is never executed.



来源:https://stackoverflow.com/questions/36888224/null-value-in-spark-streaming-from-kafka

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!