apache-flink | 易学教程

The implementation of the provided ElasticsearchSinkFunction is not serializable(flink-connector-elasticsearch6_2.11)

阅读更多关于 The implementation of the provided ElasticsearchSinkFunction is not serializable(flink-connector-elasticsearch6_2.11)

问题 "non-serializable" error occurs when I follow flink document to write data via flink streaming. I use flink1.6,Elastic-Search-6.4 and flink-connector-elasticsearch6. My code is like @Test public void testStringInsert() throws Exception { StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime); env.enableCheckpointing(100); // DataStreamSource<String> input = env.fromCollection(Collections.singleton(

How to implement Group Window Function to a “Over Partition By” on Flink SQL?

阅读更多关于 How to implement Group Window Function to a “Over Partition By” on Flink SQL?

问题 I'm trying to use time windows over Flink SQL, it has been hard for me to get familiar with the framework, but I have already defined my StreamExecutionEnvironment StreamTableEnvironment FlinkKafkaConsumer Then apply query SQL and group by time windows as follows. val stream = env.addSource(new FlinkKafkaConsumer[String]("flink", new SimpleStringSchema(), properties) ) val parsed: DataStream[Order] = stream.map(x=> .... //then I register a DataStream as a table, (Flink Version: 9.3) tEnv

flink start scala shell - numberformat exepction

阅读更多关于 flink start scala shell - numberformat exepction

问题 How can I start a flink interactive (scala) shell? Preferably, using scala 2.12. However, it looks like only 2.11 is working for now. Anyways. When using 2.11, i.e. downloading https://www.apache.org/dyn/closer.lua/flink/flink-1.10.1/flink-1.10.1-bin-scala_2.11.tgz unzipping executing ./bin/start-scala-shell.sh local I get the following error: [ERROR] Failed to construct terminal; falling back to unsupported java.lang.NumberFormatException: For input string: "0x100" at java.lang

How to store checkpoint into remote RocksDB in Apache Flink

阅读更多关于 How to store checkpoint into remote RocksDB in Apache Flink

问题 I know that there are three kinds of state backends in Apache Flink: MemoryStateBackend, FsStateBackend and RocksDBStateBackend. MemoryStateBackend stores the checkpoints into local RAM, FsStateBackend stores the checkpoints into local FileSystem, and RocksDBStateBackend stores the checkpoints into RocksDB. I have some questions about the RocksDBStateBackend. As my understanding, the mechanism of RocksDBStateBackend has been embedded into Apache Flink. The rocksDB is a kind of key-value DB.

Is one TaskManager with three slots the same as three TaskManagers with one slot in Apache Flink

阅读更多关于 Is one TaskManager with three slots the same as three TaskManagers with one slot in Apache Flink

问题 In Flink, as my understanding, JobManager can assign a job to multiple TaskManagers with multiple slots if necessary. For example, one job can be assigned three TaskManagers, using five slots. Now, saying that I execute one TaskManager(TM) with three slots, which is assigned to 3G RAM and one CPU. Is this totally the same as executing three TaskManagers, sharing one CPU, and each of them is assigned to 1 G RAM? case 1 --------------- | 3G RAM | | one CPU | | three slots | | TM | -------------

Is one TaskManager with three slots the same as three TaskManagers with one slot in Apache Flink

阅读更多关于 Is one TaskManager with three slots the same as three TaskManagers with one slot in Apache Flink

Create FlinkSQL UDF with generic return type

阅读更多关于 Create FlinkSQL UDF with generic return type

问题 I would like to define function MAX_BY that takes value of type T and ordering parameter of type Number and returns max element from window according to ordering (of type T ). I've tried public class MaxBy<T> extends AggregateFunction<T, Tuple2<T, Number>> { @Override public T getValue(Tuple2<T, Number> tuple) { return tuple.f0; } @Override public Tuple2<T, Number> createAccumulator() { return Tuple2.of(null, 0L); } public void accumulate(Tuple2<T, Number> acc, T value, Number order) { if

Apache Flink: java.lang.NoClassDefFoundError

阅读更多关于 Apache Flink: java.lang.NoClassDefFoundError

问题 I'm trying to follow this example but when I try to compile it, I have this error: Error: Unable to initialize main class com.amazonaws.services.kinesisanalytics.aws Caused by: java.lang.NoClassDefFoundError: org/apache/flink/streaming/api/functions/source/SourceFunction The error is due this code: private static DataStream<String> createSourceFromStaticConfig(StreamExecutionEnvironment env) { Properties inputProperties = new Properties(); inputProperties.setProperty(ConsumerConfigConstants

MongoDB as datasource to Flink

阅读更多关于 MongoDB as datasource to Flink

问题 Can MongoDB be used as a datasource to Apache Flink for processing the Streaming Data? What is the native implementation of Apache Flink to use No-SQL Database as data source? 回答1: Currently, Flink does not have a dedicated connector to read from MongoDB. What you can do is the following: Use StreamExecutionEnvironment.createInput and provide a Hadoop input format for MongoDB using Flink's wrapper input format Implement your own MongoDB source via implementing SourceFunction /

Apache Flink: ProcessWindowFunction KeyBy() multiple values

阅读更多关于 Apache Flink: ProcessWindowFunction KeyBy() multiple values

问题 I'm trying to use WindowFunction with DataStream, my goal is to have a Query like the following SELECT *, count(id) OVER(PARTITION BY country) AS c_country, count(id) OVER(PARTITION BY city) AS c_city, count(id) OVER(PARTITION BY city) AS c_addrs FROM fm ORDER BY country have helped me for the aggregation by the country field, but I need to do the aggregation by two fields in the same time window. I don't know if it is possible to have two or more keys in keyBy( ) for this case val parsed =