apache-flink | 易学教程

Flink checkpoints keeps failing

阅读更多关于 Flink checkpoints keeps failing

问题 we are trying to setup a Flink stateful job using RocksDB backend. We are using session window, with 30mins gap. We use aggregateFunction, so not using any Flink state variables. With sampling, we have less than 20k events/s, 20 - 30 new sessions/s. Our session basically gather all the events. the size of the session accumulator would go up along time. We are using 10G memory in total with Flink 1.9, 128 containers. Following's the settings: state.backend: rocksdb state.checkpoints.dir: hdfs:

How to properly test a Flink window function?

阅读更多关于 How to properly test a Flink window function?

问题 Does anyone know how to test windowing functions in Flink ? I am using the dependency flink-test-utils_2.11 . My steps are: Get the StreamExecutionEnvironment Create objects and add to the invironment Do a keyBy add a Session Window execute an aggregate function public class AggregateVariantCEVTest extends AbstractTestBase { @Test public void testAggregateVariantCev() throws Exception { StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.setParallelism(1

How to use multiple counters in Flink

阅读更多关于 How to use multiple counters in Flink

问题 (kinda related to How to create dynamic metric in Flink) I have a stream of events(someid:String, name:String) and for monitoring reasons, I need a counter per event ID. In all the Flink documentations and examples, I can see that the counter is , for instance, initialised with a name in the open of a map function. But in my case I can not initialise the counter as I will need one per eventId and I do not know the value in advance. Also, I understand how expensive it would be to create a new

How to use multiple counters in Flink

阅读更多关于 How to use multiple counters in Flink

How to display intermediate results in a windowed streaming-etl?

阅读更多关于 How to display intermediate results in a windowed streaming-etl?

问题 We currently do a real-time aggregation of data in an event-store. The idea is to visualize transaction data for multiple time ranges (monthly, weekly, daily, hourly) and for multiple nominal keys. We regularly have late data, so we need to account for that. Furthermore the requirement is to display "running" results, that is value of the current window even before it is complete. Currently we are using Kafka and Apache Storm (specifically Trident i.e. microbatches) to do this. Our

How to display intermediate results in a windowed streaming-etl?

阅读更多关于 How to display intermediate results in a windowed streaming-etl?

How to display intermediate results in a windowed streaming-etl?

阅读更多关于 How to display intermediate results in a windowed streaming-etl?

flink kafkaproducer send duplicate message in exactly once mode when checkpoint restore

阅读更多关于 flink kafkaproducer send duplicate message in exactly once mode when checkpoint restore

问题 I am writing a case to test flink two step commit, below is overview. sink kafka is exactly once kafka producer. sink step is mysql sink extend two step commit . sink compare is mysql sink extend two step commit , and this sink will occasionally throw a exeption to simulate checkpoint failed. When checkpoint is failed and restore, I find mysql two step commit will work fine, but kafka consumer will read offset from last success and kafka producer produce messages even he was done it before

flink kafkaproducer send duplicate message in exactly once mode when checkpoint restore

阅读更多关于 flink kafkaproducer send duplicate message in exactly once mode when checkpoint restore

flink kafkaproducer send duplicate message in exactly once mode when checkpoint restore

阅读更多关于 flink kafkaproducer send duplicate message in exactly once mode when checkpoint restore