问题
I have a simple program because I'm trying to receive data using kafka
. When I start a kafka producer and I send data, for example: "Hello", I get this when I print the message: (null, Hello)
. And I don't know why this null appears. Is there any way to avoid this null? I think it's due to Tuple2<String, String>
, the first parameter, but I only want to print the second parameter. And another thing, when I print that using System.out.println("inside map "+ message);
it does not appear any message, does someone know why? Thanks.
public static void main(String[] args){
SparkConf sparkConf = new SparkConf().setAppName("org.kakfa.spark.ConsumerData").setMaster("local[4]");
// Substitute 127.0.0.1 with the actual address of your Spark Master (or use "local" to run in local mode
sparkConf.set("spark.cassandra.connection.host", "127.0.0.1");
// Create the context with 2 seconds batch size
JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, new Duration(2000));
Map<String, Integer> topicMap = new HashMap<>();
String[] topics = KafkaProperties.TOPIC.split(",");
for (String topic: topics) {
topicMap.put(topic, KafkaProperties.NUM_THREADS);
}
/* connection to cassandra */
CassandraConnector connector = CassandraConnector.apply(sparkConf);
System.out.println("+++++++++++ cassandra connector created ++++++++++++++++++++++++++++");
/* Receive kafka inputs */
JavaPairReceiverInputDStream<String, String> messages =
KafkaUtils.createStream(jssc, KafkaProperties.ZOOKEEPER, KafkaProperties.GROUP_CONSUMER, topicMap);
System.out.println("+++++++++++++ streaming-kafka connection done +++++++++++++++++++++++++++");
JavaDStream<String> lines = messages.map(
new Function<Tuple2<String, String>, String>() {
public String call(Tuple2<String, String> message) {
System.out.println("inside map "+ message);
return message._2();
}
}
);
messages.print();
jssc.start();
jssc.awaitTermination();
}
回答1:
Q1) Null values:
Messages in Kafka are Keyed, that means they all have a (Key, Value) structure.
When you see (null, Hello)
is because the producer published a (null,"Hello")
value in a topic.
If you want to omit the key in your process, map the original Dtream
to remove the key: kafkaDStream.map( new Function<String,String>() {...})
Q2) System.out.println("inside map "+ message);
does not print. A couple of classical reasons:
Transformations are applied in the executors, so when running in a cluster, that output will appear in the executors and not on the master.
Operations are lazy and DStreams need to be materialized for operations to be applied.
In this specific case, the JavaDStream<String> lines
is never materialized i.e. not used for an output operation. Therefore the map
is never executed.
来源:https://stackoverflow.com/questions/36888224/null-value-in-spark-streaming-from-kafka