flink-sql

An exponentially decaying moving average over a hopping window in Flink SQL: Casting time

大兔子大兔子 提交于 2019-12-24 19:40:38
问题 Now we have SQL with fancy windowing in Flink, I'm trying to have the decaying moving average referred by "what will be possible in future Flink releases for both the Table API and SQL." from their SQL roadmap/preview 2017-03 post: table .window(Slide over 1.hour every 1.second as 'w) .groupBy('productId, 'w) .select( 'w.end, 'productId, ('unitPrice * ('rowtime - 'w.start).exp() / 1.hour).sum / (('rowtime - 'w.start).exp() / 1.hour).sum) Here is my attempt (inspired as well by the calcite

Why does Flink SQL use a cardinality estimate of 100 rows for all tables?

旧时模样 提交于 2019-12-20 03:01:27
问题 I wasn't sure why the logical plan wasn't correctly evaluated in this example. I looked more deeply in the Flink base code and I checked that when calcite evaluate/estimate the number of rows for the query in object. For some reason it returns always 100 for any table source . In Flink in fact, during the process of the program plan creation, for each transformed rule it is called the VolcanoPlanner class by the TableEnvironment.runVolcanoPlanner. The planner try to optimise and calculate

Having an equivalent to HOP_START inside an aggregation primitive in Flink

让人想犯罪 __ 提交于 2019-12-14 03:09:00
问题 I'm trying to do an exponentially decaying moving average over a hopping window in Flink SQL. I need the have access to one of the borders of the window, the HOP_START in the following: SELECT lb_index one_key, -- I have access to this one: HOP_START(proctime, INTERVAL '0.05' SECOND, INTERVAL '5' SECOND) start_time, -- Aggregation primitive: SUM( Y * EXP(TIMESTAMPDIFF( SECOND, proctime, -- This one throws: HOP_START(proctime, INTERVAL '0.05' SECOND, INTERVAL '5' SECOND) ))) FROM write

Does Flink SQL support Java Map types?

懵懂的女人 提交于 2019-12-14 02:32:39
问题 I'm trying to access a key from a map using Flink's SQL API. It fails with the error Exception in thread "main" org.apache.flink.table.api.TableException: Type is not supported: ANY Please advise how i can fix it. Here is my event class public class EventHolder { private Map<String,String> event; public Map<String, String> getEvent() { return event; } public void setEvent(Map<String, String> event) { this.event = event; } } Here is the main class which submits the flink job public class

How can I create an External Catalog Table in Apache Flink

本小妞迷上赌 提交于 2019-12-13 00:56:41
问题 I tried to create and ExternalCatalog to use in Apache Flink Table. I created and added to the Flink table environment (here the official documentation). For some reason, the only external table present in the 'catalog', it is not found during the scan. What I missed in the code above? val catalogName = s"externalCatalog$fileNumber" val ec: ExternalCatalog = getExternalCatalog(catalogName, 1, tableEnv) tableEnv.registerExternalCatalog(catalogName, ec) val s1: Table = tableEnv.scan("S_EXT")

Apache Flink - enable join ordering

僤鯓⒐⒋嵵緔 提交于 2019-12-12 19:24:43
问题 I have noticed that Apache Flink does not optimise the order in which the tables are joined. At the moment, it keeps the user-specified join order (basically, it takes the the query literally). I suppose that Apache Calcite can optimise the order of joins but for some reason these rules are not in use in Apache Flink. If, for example, we have two tables ' R ' and ' S ' private val tableEnv: BatchTableEnvironment = TableEnvironment.getTableEnvironment(env) private val fileNumber = 1 tableEnv

Why not on-time when I consumed kafka message using flink streaming sql group by TUMBLE(rowtime)?

这一生的挚爱 提交于 2019-12-11 17:22:25
问题 When I produce 20 messages, only consume 13 messages, the rest 7 not consumed real-time and timely. When some time later, I produce another 20 messages, the rest 7 messages of last time just been consumed. Complete code in location: https://github.com/shaozhipeng/flink-quickstart/blob/master/src/main/java/me/icocoro/quickstart/streaming/sql/KafkaStreamToJDBCTable.java Update different AssignerWithPeriodicWatermarks was not effective. private static final String LOCAL_KAFKA_BROKER = "localhost

Apache Flink: Performance issue when running many jobs

不问归期 提交于 2019-12-11 13:45:04
问题 With a high number of Flink SQL queries (100 of below), the Flink command line client fails with a "JobManager did not respond within 600000 ms" on a Yarn cluster, i.e. the job is never started on the cluster. JobManager logs has nothing after the last TaskManager started except DEBUG logs with "job with ID 5cd95f89ed7a66ec44f2d19eca0592f7 not found in JobManager", indicating its likely stuck (creating the ExecutionGraph?). The same works as standalone java program locally (high CPU initially

Flink : Rowtime attributes must not be in the input rows of a regular join

孤人 提交于 2019-12-11 04:48:04
问题 Using flink SQL API, I want to join multiple tables together and do some computation over time window. I have 3 table coming from CSV files, and one coming from Kafka. In the Kafka table, I have a field timestampMs , that I want to use for my time window operations. For that I did the following code : reamExecutionEnvironment env = ... ; StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env); TableSource table1 = CsvTableSource.builder() .path("path/to/file1.csv")

How to sort a stream by event time using Flink SQL

不想你离开。 提交于 2019-11-28 11:26:33
问题 I have an out-of-order DataStream<Event> that I want to sort so that the events are ordered by their event time timestamps. I've simplified my use case down to where my Event class has just a single field -- the timestamp field: public static void main(String[] args) throws Exception { StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); StreamTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(env); env.setStreamTimeCharacteristic