apache-flink | 易学教程

How to load external jars in Flink

阅读更多关于 How to load external jars in Flink

问题 When I submit jobs to Flink in a standalone cluster mode, I find each time the taskManager will fetch the jar from the jobManager (even for the same jar), which takes a long time. I am wondering whether it is possible to keep these jars in each worker node such that they will automatically load the jars locally for each run. 回答1: When you deploy your cluster, any jar in the lib folder will be available to the node in your cluster. The lib folder is the one that typically contains the flink

Detect absence of a certain event

阅读更多关于 Detect absence of a certain event

问题 In the documentation of FlinkCEP, I found that I can enforce that a particular event doesn't occur between two other events using notFollowedBy or notNext . However, I was wondering If I could detect the absence of a certain event after a time X. For example, if an event A is not followed by another event A within 10 seconds, fire an alert or do something. Could be possible to define a FlinkCEP pattern to capture that situation? Thanks in advance, Humberto 回答1: Although Flink CEP does not

Flink Table API & SQL and map types (Scala)

阅读更多关于 Flink Table API & SQL and map types (Scala)

问题 I am using Flink's Table API and/or Flink's SQL support (Flink 1.3.1, Scala 2.11) in a streaming environment. I'm starting with a DataStream[Person] , and Person is a case class that looks like: Person(name: String, age: Int, attributes: Map[String, String]) All is working as expected until I start to bring attributes into the picture. For example: val result = streamTableEnvironment.sql( """ |SELECT |name, |attributes['foo'], |TUMBLE_START(rowtime, INTERVAL '1' MINUTE) |FROM myTable |GROUP

Flink SQL Client connect to non local cluster

阅读更多关于 Flink SQL Client connect to non local cluster

问题 Is it possible to connect the flink sql client to a remote cluster? I assume the client uses some configuration to determine job manager address, but I don’t see it mentioned in docs. 回答1: Yes, that's possible. You can configure the connection to a remote cluster in the conf/flink-conf.yaml file: jobmanager.rpc.address: localhost jobmanager.rpc.port: 6123 来源： https://stackoverflow.com/questions/61623836/flink-sql-client-connect-to-non-local-cluster

KafkaAvroDeserializer failing with Kyro Exception

阅读更多关于 KafkaAvroDeserializer failing with Kyro Exception

问题 I have written a consumer to read Avro's generic record using a schema registry. FlinkKafkaConsumer010 kafkaConsumer010 = new FlinkKafkaConsumer010(KAFKA_TOPICS, new KafkaGenericAvroDeserializationSchema(schemaRegistryUrl), properties); And the Deserialization class looks like this : public class KafkaGenericAvroDeserializationSchema implements KeyedDeserializationSchema<GenericRecord> { private final String registryUrl; private transient KafkaAvroDeserializer inner; public

Combining low-latency streams with multiple meta-data streams in Flink (enrichment)

阅读更多关于 Combining low-latency streams with multiple meta-data streams in Flink (enrichment)

问题 I am evaluating Flink for a streaming analytics scenario and haven't found sufficient information on how to fulfil a kind of ETL setup we are doing in a legacy system today. A very common scenario is that we have keyed, slow throughput, meta-data streams that we want to use for enrichment on high throughput data streams, something in the line of: This raises two questions concerning Flink: How does one enrich a fast moving stream with slowly updating streams where the time windows overlap,

Apache Flink RollingFileAppender

阅读更多关于 Apache Flink RollingFileAppender

问题 I'm using Apache Flink v1.2. I wanted to switch to a rolling file appender to avoid huge log files containing data for several days. However it doesn't seem to work. I adapted the log4j Configuration ( log4j.properties ) as follows: log4j.appender.file=org.apache.log4j.rolling.RollingFileAppender log4j.appender.file.RollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy log4j.appender.file.DatePattern='.' yyyy-MM-dd-a'.log' log4j.appender.file.MaxBackupIndex = 15 log4j.appender.file

About core allocation for flink task manager and task slot

阅读更多关于 About core allocation for flink task manager and task slot

问题 I use following command to kick off a flink yarn session. yarn-session.sh -jm 4096 -tm 4096 -n 4 -s 2 With above command, it means that 4 task managers will be started(also means that 4 yarn containers are started since every task manager is a yarn container), and 2 slots for each task manager. Since one task manager is a yarn container, that means, only one core is allocated for each task manager , but I have specified 2 slots for each task manager, which means the two slots will share only

Apache Flink checkpointing stuck

阅读更多关于 Apache Flink checkpointing stuck

问题 we are running a job that has a ListState of between 300GB and 400GB and sometimes the list can grow to few thousands. In our use case, every item must have its own TTL, therefore we we create a new Timer for every new item of this ListState with a RocksDB backend on S3. This currently is about 140+ millions of timers (that will trigger at event.timestamp + 40days ). Our problem is that suddenly the checkpointing of the job gets stuck, or VERY slow (like 1% in few hours) until it eventually

“Buffer pool is destroyed” issue found in Apache Flink flapMap Operator

阅读更多关于 “Buffer pool is destroyed” issue found in Apache Flink flapMap Operator

问题 When I try to write to OUT collection in flatMap operator, I get illegal state exception(only under high load): Buffer pool is destroyed What wrong am I doing here ? When flink throws Buffer pool error ? java.lang.RuntimeException: Buffer pool is destroyed. at org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:110) at org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:89) at org.apache.flink.streaming