apache-flink

How to load external jars in Flink

这一生的挚爱 提交于 2021-01-29 06:06:02
问题 When I submit jobs to Flink in a standalone cluster mode, I find each time the taskManager will fetch the jar from the jobManager (even for the same jar), which takes a long time. I am wondering whether it is possible to keep these jars in each worker node such that they will automatically load the jars locally for each run. 回答1: When you deploy your cluster, any jar in the lib folder will be available to the node in your cluster. The lib folder is the one that typically contains the flink

Detect absence of a certain event

萝らか妹 提交于 2021-01-28 17:50:06
问题 In the documentation of FlinkCEP, I found that I can enforce that a particular event doesn't occur between two other events using notFollowedBy or notNext . However, I was wondering If I could detect the absence of a certain event after a time X. For example, if an event A is not followed by another event A within 10 seconds, fire an alert or do something. Could be possible to define a FlinkCEP pattern to capture that situation? Thanks in advance, Humberto 回答1: Although Flink CEP does not

Flink Table API & SQL and map types (Scala)

怎甘沉沦 提交于 2021-01-28 12:42:23
问题 I am using Flink's Table API and/or Flink's SQL support (Flink 1.3.1, Scala 2.11) in a streaming environment. I'm starting with a DataStream[Person] , and Person is a case class that looks like: Person(name: String, age: Int, attributes: Map[String, String]) All is working as expected until I start to bring attributes into the picture. For example: val result = streamTableEnvironment.sql( """ |SELECT |name, |attributes['foo'], |TUMBLE_START(rowtime, INTERVAL '1' MINUTE) |FROM myTable |GROUP

Flink SQL Client connect to non local cluster

拜拜、爱过 提交于 2021-01-28 11:51:04
问题 Is it possible to connect the flink sql client to a remote cluster? I assume the client uses some configuration to determine job manager address, but I don’t see it mentioned in docs. 回答1: Yes, that's possible. You can configure the connection to a remote cluster in the conf/flink-conf.yaml file: jobmanager.rpc.address: localhost jobmanager.rpc.port: 6123 来源: https://stackoverflow.com/questions/61623836/flink-sql-client-connect-to-non-local-cluster

KafkaAvroDeserializer failing with Kyro Exception

谁说我不能喝 提交于 2021-01-28 11:02:00
问题 I have written a consumer to read Avro's generic record using a schema registry. FlinkKafkaConsumer010 kafkaConsumer010 = new FlinkKafkaConsumer010(KAFKA_TOPICS, new KafkaGenericAvroDeserializationSchema(schemaRegistryUrl), properties); And the Deserialization class looks like this : public class KafkaGenericAvroDeserializationSchema implements KeyedDeserializationSchema<GenericRecord> { private final String registryUrl; private transient KafkaAvroDeserializer inner; public

Combining low-latency streams with multiple meta-data streams in Flink (enrichment)

旧巷老猫 提交于 2021-01-28 08:30:48
问题 I am evaluating Flink for a streaming analytics scenario and haven't found sufficient information on how to fulfil a kind of ETL setup we are doing in a legacy system today. A very common scenario is that we have keyed, slow throughput, meta-data streams that we want to use for enrichment on high throughput data streams, something in the line of: This raises two questions concerning Flink: How does one enrich a fast moving stream with slowly updating streams where the time windows overlap,

Apache Flink RollingFileAppender

会有一股神秘感。 提交于 2021-01-28 06:50:47
问题 I'm using Apache Flink v1.2. I wanted to switch to a rolling file appender to avoid huge log files containing data for several days. However it doesn't seem to work. I adapted the log4j Configuration ( log4j.properties ) as follows: log4j.appender.file=org.apache.log4j.rolling.RollingFileAppender log4j.appender.file.RollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy log4j.appender.file.DatePattern='.' yyyy-MM-dd-a'.log' log4j.appender.file.MaxBackupIndex = 15 log4j.appender.file

About core allocation for flink task manager and task slot

怎甘沉沦 提交于 2021-01-28 05:44:54
问题 I use following command to kick off a flink yarn session. yarn-session.sh -jm 4096 -tm 4096 -n 4 -s 2 With above command, it means that 4 task managers will be started(also means that 4 yarn containers are started since every task manager is a yarn container), and 2 slots for each task manager. Since one task manager is a yarn container, that means, only one core is allocated for each task manager , but I have specified 2 slots for each task manager, which means the two slots will share only

Apache Flink checkpointing stuck

一曲冷凌霜 提交于 2021-01-28 02:05:57
问题 we are running a job that has a ListState of between 300GB and 400GB and sometimes the list can grow to few thousands. In our use case, every item must have its own TTL, therefore we we create a new Timer for every new item of this ListState with a RocksDB backend on S3. This currently is about 140+ millions of timers (that will trigger at event.timestamp + 40days ). Our problem is that suddenly the checkpointing of the job gets stuck, or VERY slow (like 1% in few hours) until it eventually

“Buffer pool is destroyed” issue found in Apache Flink flapMap Operator

陌路散爱 提交于 2021-01-27 23:54:10
问题 When I try to write to OUT collection in flatMap operator, I get illegal state exception(only under high load): Buffer pool is destroyed What wrong am I doing here ? When flink throws Buffer pool error ? java.lang.RuntimeException: Buffer pool is destroyed. at org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:110) at org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:89) at org.apache.flink.streaming