问题
We are building an application to get data from sensors. The data is streamed to Kafka from where consumers will publish it to different data stores. Each data point will have multiple attributes representing the state of the sensor.
In one of the consumers we want to publish the data to the data store only if the value has changed. for e.g. if there is temperature sensor which is polled for data every 10 secs we expect to receive data like
----------------------------------------------------------------------
Key Value
----------------------------------------------------------------------
Sensor1 {timestamp: "10-10-2019 10:20:30", temperature: 10}
Sensor1 {timestamp: "10-10-2019 10:20:40", temperature: 10}
Sensor1 {timestamp: "10-10-2019 10:20:50", temperature: 11}
In the above case only the first record and the third record should be published.
For this we need some way to compare the current value for a key with the previous value with the same key. I believe this should be possible with KTable or KStream but unable to find examples.
Any help will be great!
回答1:
Here is an example how to solve this with KStream#transformValues()
.
StreamsBuilder builder = new StreamsBuilder();
StoreBuilder<KeyValueStore<String, YourValueType>> keyValueStoreBuilder =
Stores.keyValueStoreBuilder(Stores.persistentKeyValueStore(stateStoreName),
Serdes.String(),
YourValueTypeSerde());
builder.addStateStore(keyValueStoreBuilder);
stream = builder.stream(INPUT_TOPIC, Consumed.with(Serdes.Integer(), YourValueTypeSerde()))
.transformValues(() -> new ValueTransformerWithKey<String, YourValueType, YourValueType>() {
private KeyValueStore<String, YourValueType> state;
@Override
public void init(final ProcessorContext context) {
state = (KeyValueStore<String, YourValueType>) context.getStateStore(stateStoreName);}
@Override
public YourValueType transform(final String key, final YourValueType value) {
YourValueType prevValue = state.get(key);
if (prevValue != null) {
if (prevValue.temperature() != value.temperature()) {
return prevValue;
}
} else {
state.put(key, value);
}
return null;
}
@Override
public void close() {}
}, stateStorName))
.to(OUTPUT_TOPIC);
You compare the record with the previous record stored in the state store. If temperature is different you return the record from the state store and store the current record in the state store. If the temperature is equal you discard the current record.
回答2:
You can use a Kafka stream Processor API. You can set up a local key value store as the state context. The process function is called for each record fetched.
In the process function you can check against the last value stored and accept or reject the latest record based on business logic (in your case comparing temperature value).
In the punctuate function you can then forward the record to the consumer on a schedule. See the sample code below (without punctuate)
public class SensorProcessor implements Processor<String, String> {
private ProcessorContext context;
private KeyValueStore<String, String> kvStore;
@Override
@SuppressWarnings("unchecked")
public void init(ProcessorContext context) {
// keep the processor context locally because we need it in punctuate() and commit()
this.context = context;
// retrieve the key-value store named "SensorData"
kvStore = (KeyValueStore) context.getStateStore("SensorData");
// schedule a punctuate() method every second based on event-time
}
@Override
public void process(String sensorName, String sensorData) {
String oldValue = this.kvStore.get(sensorName);
if (oldValue == null) {
this.kvStore.put(sensorName, sensorData);
} else {
//Put the business logic for comparison
//compare temperatures
//if required put the value
this.kvStore.put(sensorName, sensorData);
//Forward it o consumer
context.forward(sensorName, sensorData);
}
context.commit();
}
}
@Override
public void close() {
// nothing to do
}
}
回答3:
If you want to do this with Kafka Streams you have to use Processor API.
You need to implement you custom Transformer
with State store.
For each message you should search value in State store if it has changed or it is not present you should return new value, otherwise null. Apart of that you should also save that value in state store (KeyValueStore::put(...)
)
More regarding Processor API can be found: here
来源:https://stackoverflow.com/questions/58745670/kafka-compare-consecutive-values-for-a-key