flink-streaming

Unable to execute HTTP request: Timeout waiting for connection from pool in Flink

≯℡__Kan透↙ 提交于 2021-01-29 09:02:26
问题 I'm working on an app which uploads some files to an s3 bucket and at a later point, it reads files from s3 bucket and pushes it to my database . I'm using Flink 1.4.2 and fs.s3a API for reading and write files from the s3 bucket. Uploading files to s3 bucket works fine without any problem but when the second phase of my app that is reading those uploaded files from s3 starts, my app is throwing following error : Caused by: java.io.InterruptedIOException: Reopen at position 0 on s3a:/

Apache Flink Mapping at Runtime

霸气de小男生 提交于 2021-01-29 07:01:17
问题 i have build a flink streaming job to read a xml file from kafka convert the file and write it in a database. As the attributes in the xml file don't match the database column names i have build a switch case for the mapping. As this is not really flexible i want to take this hardwired mapping information out of the code. First of all i came up with the idea of a mapping file which could look like this: path.in.xml.to.attribut=database.column.name The current job logic looks like this: switch

Apache Flink Mapping at Runtime

你。 提交于 2021-01-29 06:50:16
问题 i have build a flink streaming job to read a xml file from kafka convert the file and write it in a database. As the attributes in the xml file don't match the database column names i have build a switch case for the mapping. As this is not really flexible i want to take this hardwired mapping information out of the code. First of all i came up with the idea of a mapping file which could look like this: path.in.xml.to.attribut=database.column.name The current job logic looks like this: switch

Flink Table API & SQL and map types (Scala)

怎甘沉沦 提交于 2021-01-28 12:42:23
问题 I am using Flink's Table API and/or Flink's SQL support (Flink 1.3.1, Scala 2.11) in a streaming environment. I'm starting with a DataStream[Person] , and Person is a case class that looks like: Person(name: String, age: Int, attributes: Map[String, String]) All is working as expected until I start to bring attributes into the picture. For example: val result = streamTableEnvironment.sql( """ |SELECT |name, |attributes['foo'], |TUMBLE_START(rowtime, INTERVAL '1' MINUTE) |FROM myTable |GROUP

KafkaAvroDeserializer failing with Kyro Exception

谁说我不能喝 提交于 2021-01-28 11:02:00
问题 I have written a consumer to read Avro's generic record using a schema registry. FlinkKafkaConsumer010 kafkaConsumer010 = new FlinkKafkaConsumer010(KAFKA_TOPICS, new KafkaGenericAvroDeserializationSchema(schemaRegistryUrl), properties); And the Deserialization class looks like this : public class KafkaGenericAvroDeserializationSchema implements KeyedDeserializationSchema<GenericRecord> { private final String registryUrl; private transient KafkaAvroDeserializer inner; public

Combining low-latency streams with multiple meta-data streams in Flink (enrichment)

旧巷老猫 提交于 2021-01-28 08:30:48
问题 I am evaluating Flink for a streaming analytics scenario and haven't found sufficient information on how to fulfil a kind of ETL setup we are doing in a legacy system today. A very common scenario is that we have keyed, slow throughput, meta-data streams that we want to use for enrichment on high throughput data streams, something in the line of: This raises two questions concerning Flink: How does one enrich a fast moving stream with slowly updating streams where the time windows overlap,

Apache Flink RollingFileAppender

会有一股神秘感。 提交于 2021-01-28 06:50:47
问题 I'm using Apache Flink v1.2. I wanted to switch to a rolling file appender to avoid huge log files containing data for several days. However it doesn't seem to work. I adapted the log4j Configuration ( log4j.properties ) as follows: log4j.appender.file=org.apache.log4j.rolling.RollingFileAppender log4j.appender.file.RollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy log4j.appender.file.DatePattern='.' yyyy-MM-dd-a'.log' log4j.appender.file.MaxBackupIndex = 15 log4j.appender.file

Apache Flink checkpointing stuck

一曲冷凌霜 提交于 2021-01-28 02:05:57
问题 we are running a job that has a ListState of between 300GB and 400GB and sometimes the list can grow to few thousands. In our use case, every item must have its own TTL, therefore we we create a new Timer for every new item of this ListState with a RocksDB backend on S3. This currently is about 140+ millions of timers (that will trigger at event.timestamp + 40days ). Our problem is that suddenly the checkpointing of the job gets stuck, or VERY slow (like 1% in few hours) until it eventually

Exception when trying to upgrade to flink 1.3.1

大兔子大兔子 提交于 2021-01-27 06:30:10
问题 I tried to upgrade my flink version in my cluster to 1.3.1 (and 1.3.2 as well) and I got the following exception in my task managers: 2018-02-28 12:57:27,120 ERROR org.apache.flink.streaming.runtime.tasks.StreamTask - Error during disposal of stream operator. org.apache.kafka.common.KafkaException: java.lang.InterruptedException at org.apache.kafka.clients.producer.KafkaProducer.close(KafkaProducer.java:424) at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerBase.close

How to count the number of records processed by Apache Flink in a given time window

僤鯓⒐⒋嵵緔 提交于 2021-01-01 04:29:58
问题 After defining a time window in flink as follows: val lines = socket.timeWindowAll(Time.seconds(5)) How can I compute the number of records in that particular window of 5 seconds? 回答1: The most efficient way to perform a count aggregation is a ReduceFunction . However, reduce has the restriction that input and output type must be identical. So you would have to convert the input to an Int before applying the window: val socket: DataStream[(String)] = ??? val cnts: DataStream[Int] = socket