Having an equivalent to HOP_START inside an aggregation primitive in Flink

让人想犯罪 __ 提交于 2019-12-14 03:09:00

问题


I'm trying to do an exponentially decaying moving average over a hopping window in Flink SQL. I need the have access to one of the borders of the window, the HOP_START in the following:

SELECT                                                                              
  lb_index one_key,
-- I have access to this one:
  HOP_START(proctime, INTERVAL '0.05' SECOND, INTERVAL '5' SECOND) start_time,
-- Aggregation primitive:
  SUM(
    Y * EXP(TIMESTAMPDIFF(
      SECOND, 
      proctime, 
-- This one throws:
      HOP_START(proctime, INTERVAL '0.05' SECOND, INTERVAL '5' SECOND)
  )))
FROM write_position                                                                
GROUP BY lb_index, HOP(proctime, INTERVAL '0.05' SECOND, INTERVAL '5' SECOND)

I'm getting the following stack trace:

11:55:37.011 [main] DEBUG o.a.c.p.RelOptPlanner - For final plan, using Aggregate(groupBy: (lb_index), window: (SlidingGroupWindow('w$, 'proctime, 5000.millis, 50.millis)), select: (lb_index, SUM($f2) AS Y, start('w$) AS w$start, end('w$) AS w$end, proctime('w$) AS w$proctime))
11:55:37.011 [main] DEBUG o.a.c.p.RelOptPlanner - For final plan, using Calc(select: (lb_index, proctime, *(payload.Y, EXP(/(CAST(/INT(Reinterpret(-(HOP_START(PROCTIME(proctime), 50, 5000), PROCTIME(proctime))), 1000)), 1000))) AS $f2))
11:55:37.011 [main] DEBUG o.a.c.p.RelOptPlanner - For final plan, using rel#459:DataStreamScan.DATASTREAM.true.Acc(table=[_DataStreamTable_0])
Exception in thread "main" org.apache.flink.table.codegen.CodeGenException: Unsupported call: HOP_START 
If you think this function should be supported, you can create an issue and start a discussion for it.
    at org.apache.flink.table.codegen.CodeGenerator$$anonfun$visitCall$3.apply(CodeGenerator.scala:1027)
    at org.apache.flink.table.codegen.CodeGenerator$$anonfun$visitCall$3.apply(CodeGenerator.scala:1027)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.flink.table.codegen.CodeGenerator.visitCall(CodeGenerator.scala:1027)
    at org.apache.flink.table.codegen.CodeGenerator.visitCall(CodeGenerator.scala:66)

It does say is it unimplemented while it works outside the aggregating SUM. So that's what makes me thing this is a scoping issue.

Now, the thing is: I could transform this expression and do a final processing outside the aggregation, as exp(x+y) = exp(x)*exp(y); But I'm stuck with using TIMESTAMPDIFF (which did wonders in my previous issue). I have not found a way to cast TIME ATTRIBUTEs to NUMERIC types; also, I'm not comfortable exponentiating UNIX timestamps, even if I scale them down.

Anyway, this work-around would be sort of clunky and there might me another way. I don't know how I could massage scopes in this SQL piece to still 'be' in the window scope and have the start time without throwing.


回答1:


I suggest you experiment with HOP_PROCTIME() rather than HOP_START(). The differences are explained here, but the effect will be that you'll have a proctime attribute rather than a timestamp, which I'm hoping will make TIMESTAMPDIFF happy.



来源:https://stackoverflow.com/questions/54668892/having-an-equivalent-to-hop-start-inside-an-aggregation-primitive-in-flink

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!