apache-flink | 易学教程

How to check DataStream in flink is empty or having data

阅读更多关于 How to check DataStream in flink is empty or having data

问题 I am new to Apache flink i have a datastream which implements a process function if certain conditions is met then the datastream is valid and if its not meeting the conditions i am writing it to sideoutput. I am able to print the datastream is it possible to check the datastream is empty or null.I tried using datastream.equals(null) method but its not working.Please suggest how to know whether a datastream is empty or not 回答1: By "empty", I assume you mean that no data is flowing. What are

About StateTtlConfig

阅读更多关于 About StateTtlConfig

问题 I'm configuring my StateTtlConfig for MapState and my interest is the objects into the state has for example 3 hours of life and then they should disappear from state and passed to the GC to be cleaned up and release some memory and the checkpoints should release some weight too I think. I had this configuration before and it seems like it was not working because the checkpoints where always growing up: private final StateTtlConfig ttlConfig = StateTtlConfig.newBuilder(org.apache.flink.api

Apache Flink - Partitioning the stream equally as the input Kafka topic

阅读更多关于 Apache Flink - Partitioning the stream equally as the input Kafka topic

问题 I would like to implement in Apache Flink the following scenario: Given a Kafka topic having 4 partitions, I would like to process the intra-partition data independently in Flink using different logics, depending on the event's type. In particular, suppose the input Kafka topic contains the events depicted in the previous images. Each event have a different structure: partition 1 has the field " a " as key, partition 2 has the field " b " as key, etc. In Flink I would like to apply different

Flink: what's the best way to handle exceptions inside Flink jobs

阅读更多关于 Flink: what's the best way to handle exceptions inside Flink jobs

问题 I have a flink job that takes in Kafaka topics and goes through a bunch of operators. I'm wondering what's the best way to deal with exceptions that happen in the middle. My goal is to have a centralized place to handle those exceptions that may be thrown from different operators and here is my current solution: Use ProcessFunction and output sideOutput to context in the catch block, assuming there is an exception, and have a separate sink function for the sideOutput at the end where it calls

Unable to execute HTTP request: Timeout waiting for connection from pool in Flink

阅读更多关于 Unable to execute HTTP request: Timeout waiting for connection from pool in Flink

问题 I'm working on an app which uploads some files to an s3 bucket and at a later point, it reads files from s3 bucket and pushes it to my database . I'm using Flink 1.4.2 and fs.s3a API for reading and write files from the s3 bucket. Uploading files to s3 bucket works fine without any problem but when the second phase of my app that is reading those uploaded files from s3 starts, my app is throwing following error : Caused by: java.io.InterruptedIOException: Reopen at position 0 on s3a:/

Issue with job submission from Flink Job UI (Exception:org.apache.flink.client.program.OptimizerPlanEnvironment$ProgramAbortException)

阅读更多关于 Issue with job submission from Flink Job UI (Exception:org.apache.flink.client.program.OptimizerPlanEnvironment$ProgramAbortException)

问题 I have simple java code for flink job List<Tuple2> list = new ArrayList<>(); for (int i = 0; i < 10; i++) { list.add(new Tuple2(Integer.valueOf(i), "test" + i)); } StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.fromCollection(list).print(); env.execute("job1"); I packaged this code and create a jar: say flink-processor-0.1-SNAPSHOT.jar, upload it to JobManager from Submit job UI. No issues in upload. I see the EntryClass has the main class (com.abc

how to configure some external jars library to the flink docker container

阅读更多关于 how to configure some external jars library to the flink docker container

问题 I am running a flink docker image with the following configuration. version: '2.1' services: jobmanager: build: . image: flink volumes: - .:/usr/local/lib/python3.7/site-packages/pyflink/lib hostname: "jobmanager" expose: - "6123" ports: - "8081:8081" command: jobmanager environment: - JOB_MANAGER_RPC_ADDRESS=jobmanager taskmanager: image: flink volumes: - .:/usr/local/lib/python3.7/site-packages/pyflink/lib expose: - "6121" - "6122" depends_on: - jobmanager command: taskmanager links: -

Create Input Format of Elasticsearch using Flink Rich InputFormat

阅读更多关于 Create Input Format of Elasticsearch using Flink Rich InputFormat

问题 We are using Elasticsearch 6.8.4 and Flink 1.0.18. We have an index with 1 shard and 1 replica in elasticsearch and I want to create the custom input format to read and write data in elasticsearch using apache Flink dataset API with more than 1 input splits in order to achieve better performance. so is there any way I can achieve this requirement? Note: Per document size is larger(almost 8mb) and I can read only 10 documents at a time because of size constraint and per reading request, we

Apache Flink Mapping at Runtime

阅读更多关于 Apache Flink Mapping at Runtime

问题 i have build a flink streaming job to read a xml file from kafka convert the file and write it in a database. As the attributes in the xml file don't match the database column names i have build a switch case for the mapping. As this is not really flexible i want to take this hardwired mapping information out of the code. First of all i came up with the idea of a mapping file which could look like this: path.in.xml.to.attribut=database.column.name The current job logic looks like this: switch

Apache Flink Mapping at Runtime

阅读更多关于 Apache Flink Mapping at Runtime