flink-streaming

Apache Flink: Where do State Backends keep the state?

杀马特。学长 韩版系。学妹 提交于 2019-12-03 21:54:33
I got a statement below: "Depending on your state backend, Flink can also manage the state for the application, meaning Flink deals with the memory management (possibly spilling to disk if necessary) to allow applications to hold very large state." https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/state/state_backends.html Does it mean that only when the state backends is configured to RocksDBStateBackend , the state would keep in memory and possibly spilling to disk if necessary? However if configured to MemoryStateBackend or FsStateBackend , the state only keep in memory and

Flink BucketingSink with Custom AvroParquetWriter create empty file

一曲冷凌霜 提交于 2019-12-03 17:34:19
I have created a writer for BucketingSink. The sink and writer works without error but when it comes to the writer writing avro genericrecord to parquet, the file was created from in-progress, pending to complete. But the files are empty with 0 bytes. Can anyone tell me what is wrong with the code ? I have tried placing the initialization of AvroParquetWriter at the open() method, but result still the same. When debugging the code, I confirm that writer.write(element) does executed and element contain the avro genericrecord data Streaming Data BucketingSink<DataEventRecord> sink = new

Flink Streaming: How to output one data stream to different outputs depending on the data?

时光毁灭记忆、已成空白 提交于 2019-12-03 15:16:19
问题 In Apache Flink I have a stream of tuples. Let's assume a really simple Tuple1<String> . The tuple can have an arbitrary value in it's value field (e.g. 'P1', 'P2', etc.). The set of possible values is finite but I don't know the full set beforehand (so there could be a 'P362'). I want to write that tuple to a certain output location depending on the value inside of the tuple. So e.g. I would like to have the following file structure: /output/P1 /output/P2 In the documentation I only found

Flink Streaming: How to output one data stream to different outputs depending on the data?

烂漫一生 提交于 2019-12-03 05:01:17
In Apache Flink I have a stream of tuples. Let's assume a really simple Tuple1<String> . The tuple can have an arbitrary value in it's value field (e.g. 'P1', 'P2', etc.). The set of possible values is finite but I don't know the full set beforehand (so there could be a 'P362'). I want to write that tuple to a certain output location depending on the value inside of the tuple. So e.g. I would like to have the following file structure: /output/P1 /output/P2 In the documentation I only found possibilities to write to locations that I know beforehand (e.g. stream.writeCsv("/output/somewhere") ),

Unable to execute CEP pattern in Flink dashboard version 1.3.2 which is caused by ClassNotFoundException

这一生的挚爱 提交于 2019-12-02 17:26:58
问题 I have written a simple pattern like this Pattern<JoinedEvent, ?> pattern = Pattern.<JoinedEvent>begin("start") .where(new SimpleCondition<JoinedEvent>() { @Override public boolean filter(JoinedEvent streamEvent) throws Exception { return streamEvent.getRRInterval()>= 10 ; } }).within(Time.milliseconds(WindowLength)); and it executes well in IntellijIdea. I am using Flink 1.3.2 both in the dashboard and in IntelliJ-Idea. While I was building Flink from source, I have seen a lot of warning

java.io.NotSerializableException using Apache Flink with Lagom

耗尽温柔 提交于 2019-12-02 15:10:14
问题 I am writing Flink CEP program inside the Lagom's Microservice Implementation. My FLINK CEP program run perfectly fine in simple scala application. But when i use this code inside the Lagom service implementation i am receiving the following exception Lagom Service Implementation override def start = ServiceCall[NotUsed, String] { val env = StreamExecutionEnvironment.getExecutionEnvironment var executionConfig = env.getConfig env.setParallelism(1) executionConfig.disableSysoutLogging() var

Unable to execute CEP pattern in Flink dashboard version 1.3.2 which is caused by ClassNotFoundException

泪湿孤枕 提交于 2019-12-02 10:30:52
I have written a simple pattern like this Pattern<JoinedEvent, ?> pattern = Pattern.<JoinedEvent>begin("start") .where(new SimpleCondition<JoinedEvent>() { @Override public boolean filter(JoinedEvent streamEvent) throws Exception { return streamEvent.getRRInterval()>= 10 ; } }).within(Time.milliseconds(WindowLength)); and it executes well in IntellijIdea. I am using Flink 1.3.2 both in the dashboard and in IntelliJ-Idea. While I was building Flink from source, I have seen a lot of warning messages which led me to believe that iterative condition classes have not been included in a jar as error

java.io.NotSerializableException using Apache Flink with Lagom

寵の児 提交于 2019-12-02 09:59:48
I am writing Flink CEP program inside the Lagom's Microservice Implementation. My FLINK CEP program run perfectly fine in simple scala application. But when i use this code inside the Lagom service implementation i am receiving the following exception Lagom Service Implementation override def start = ServiceCall[NotUsed, String] { val env = StreamExecutionEnvironment.getExecutionEnvironment var executionConfig = env.getConfig env.setParallelism(1) executionConfig.disableSysoutLogging() var topic_name="topic_test" var props= new Properties props.put("bootstrap.servers", "localhost:9092") props

Apache flink - job simple windowing problem - java.lang.RuntimeException: segment has been freed - Mini Cluster problem

自作多情 提交于 2019-12-02 08:41:17
Apache flink - job simple windowing problem - java.lang.RuntimeException: segment has been freed Hi, I am a flink newbee and in my job, I am trying to use windowing to simply aggregate elements to enable delayed processing: src = src.timeWindowAll(Time.milliseconds(1000)).process(new BaseDelayingProcessAllWindowFunctionImpl()); processwindow function simply collects input elements: public class BaseDelayingProcessAllWindowFunction<IN> extends ProcessAllWindowFunction<IN, IN, TimeWindow> { private static final long serialVersionUID = 1L; protected Logger logger; public

Flink: How to pass extra JVM options to TaskManager and JobManager

我与影子孤独终老i 提交于 2019-12-02 03:22:41
问题 I am trying to submit flink job on yarn using below command: /usr/flink-1.3.2/bin/flink run -yd -yn 1 -ynm MyApp -ys 1 -yqu default -m yarn-cluster -c com.mycompany.Driver -j /usr/myapp.jar -Denv.java.opts="-Dzkconfig.parent /app-config_127.0.0.1 -Dzk.hosts localhost:2181 -Dsax.zookeeper.root /app" I got the env.java.opts on flink client log but when the application gets submitted to Yarn, these Java options wont be available. Due to unavailability of extra JVM options, application throws