apache-flink

Flink checkpoints keeps failing

陌路散爱 提交于 2021-02-09 08:01:50
问题 we are trying to setup a Flink stateful job using RocksDB backend. We are using session window, with 30mins gap. We use aggregateFunction, so not using any Flink state variables. With sampling, we have less than 20k events/s, 20 - 30 new sessions/s. Our session basically gather all the events. the size of the session accumulator would go up along time. We are using 10G memory in total with Flink 1.9, 128 containers. Following's the settings: state.backend: rocksdb state.checkpoints.dir: hdfs:

How to properly test a Flink window function?

被刻印的时光 ゝ 提交于 2021-02-08 09:49:20
问题 Does anyone know how to test windowing functions in Flink ? I am using the dependency flink-test-utils_2.11 . My steps are: Get the StreamExecutionEnvironment Create objects and add to the invironment Do a keyBy add a Session Window execute an aggregate function public class AggregateVariantCEVTest extends AbstractTestBase { @Test public void testAggregateVariantCev() throws Exception { StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.setParallelism(1

How to use multiple counters in Flink

ε祈祈猫儿з 提交于 2021-02-08 07:57:04
问题 (kinda related to How to create dynamic metric in Flink) I have a stream of events(someid:String, name:String) and for monitoring reasons, I need a counter per event ID. In all the Flink documentations and examples, I can see that the counter is , for instance, initialised with a name in the open of a map function. But in my case I can not initialise the counter as I will need one per eventId and I do not know the value in advance. Also, I understand how expensive it would be to create a new

How to use multiple counters in Flink

时间秒杀一切 提交于 2021-02-08 07:56:12
问题 (kinda related to How to create dynamic metric in Flink) I have a stream of events(someid:String, name:String) and for monitoring reasons, I need a counter per event ID. In all the Flink documentations and examples, I can see that the counter is , for instance, initialised with a name in the open of a map function. But in my case I can not initialise the counter as I will need one per eventId and I do not know the value in advance. Also, I understand how expensive it would be to create a new

How to display intermediate results in a windowed streaming-etl?

青春壹個敷衍的年華 提交于 2021-02-08 07:42:45
问题 We currently do a real-time aggregation of data in an event-store. The idea is to visualize transaction data for multiple time ranges (monthly, weekly, daily, hourly) and for multiple nominal keys. We regularly have late data, so we need to account for that. Furthermore the requirement is to display "running" results, that is value of the current window even before it is complete. Currently we are using Kafka and Apache Storm (specifically Trident i.e. microbatches) to do this. Our

How to display intermediate results in a windowed streaming-etl?

你离开我真会死。 提交于 2021-02-08 07:42:40
问题 We currently do a real-time aggregation of data in an event-store. The idea is to visualize transaction data for multiple time ranges (monthly, weekly, daily, hourly) and for multiple nominal keys. We regularly have late data, so we need to account for that. Furthermore the requirement is to display "running" results, that is value of the current window even before it is complete. Currently we are using Kafka and Apache Storm (specifically Trident i.e. microbatches) to do this. Our

How to display intermediate results in a windowed streaming-etl?

二次信任 提交于 2021-02-08 07:42:02
问题 We currently do a real-time aggregation of data in an event-store. The idea is to visualize transaction data for multiple time ranges (monthly, weekly, daily, hourly) and for multiple nominal keys. We regularly have late data, so we need to account for that. Furthermore the requirement is to display "running" results, that is value of the current window even before it is complete. Currently we are using Kafka and Apache Storm (specifically Trident i.e. microbatches) to do this. Our

flink kafkaproducer send duplicate message in exactly once mode when checkpoint restore

喜夏-厌秋 提交于 2021-02-08 07:27:17
问题 I am writing a case to test flink two step commit, below is overview. sink kafka is exactly once kafka producer. sink step is mysql sink extend two step commit . sink compare is mysql sink extend two step commit , and this sink will occasionally throw a exeption to simulate checkpoint failed. When checkpoint is failed and restore, I find mysql two step commit will work fine, but kafka consumer will read offset from last success and kafka producer produce messages even he was done it before

flink kafkaproducer send duplicate message in exactly once mode when checkpoint restore

二次信任 提交于 2021-02-08 07:26:20
问题 I am writing a case to test flink two step commit, below is overview. sink kafka is exactly once kafka producer. sink step is mysql sink extend two step commit . sink compare is mysql sink extend two step commit , and this sink will occasionally throw a exeption to simulate checkpoint failed. When checkpoint is failed and restore, I find mysql two step commit will work fine, but kafka consumer will read offset from last success and kafka producer produce messages even he was done it before

flink kafkaproducer send duplicate message in exactly once mode when checkpoint restore

ぃ、小莉子 提交于 2021-02-08 07:26:19
问题 I am writing a case to test flink two step commit, below is overview. sink kafka is exactly once kafka producer. sink step is mysql sink extend two step commit . sink compare is mysql sink extend two step commit , and this sink will occasionally throw a exeption to simulate checkpoint failed. When checkpoint is failed and restore, I find mysql two step commit will work fine, but kafka consumer will read offset from last success and kafka producer produce messages even he was done it before