KSQL Windowed Aggregation Stream

喜欢而已 提交于 2020-06-26 12:24:13

问题


I am trying to group events by one of its properties and over time using the KSQL Windowed Aggregation, specifically the Session Window.

I have a STREAM made from a kafka topic with the TIMESTAMP property well specified.

When I try to create a STREAM with a Session Windowing with a query like:

CREATE STREAM SESSION_STREAM AS
SELECT ...
  FROM EVENT_STREAM
WINDOW SESSION (5 MINUTES)
   GROUP BY ...;

I always get the error:

Your SELECT query produces a TABLE. Please use CREATE TABLE AS SELECT statement instead.

Is it possible to create a STREAM with a Windowed Aggregation?


When I try as suggested to create a TABLE and then a STREAM that contains all the session starting events, with a query like:

CREATE STREAM SESSION_START_STREAM AS
SELECT *
  FROM SESSION_TABLE
 WHERE WINDOWSTART=WINDOWEND;

KSQL informs me that:

KSQL does not support persistent queries on windowed tables

How to create a STREAM of events starting a session window in KSQL?


回答1:


Your create stream statement, if switched to a create table statement will create a table that is constantly being updated. The sink topic SESSION_STREAM will contain the stream of changes to the table, i.e. its changelog.

ksqlDB models this as a TABLE, because it has TABLE semantics, i.e. only a single row can exist in the table with any specific key. However, the changelog will contain the STREAM of changes that have been applied to the table.

If what you want is a topic containing all the sessions then something like this will create that:

-- create a stream with a new 'data' topic:
CREATE STREAM DATA (USER_ID INT) 
    WITH (kafka_topic='data', value_format='json');

-- create a table that tracks user interactions per session:
CREATE TABLE SESSION AS
SELECT USER_ID, COUNT(USER_ID) AS COUNT
  FROM DATA
WINDOW SESSION (5 SECONDS)
   GROUP BY USER_ID;

This will create a SESSIONS topic that contains the changes to the SESSIONS table: i.e. its changelog.

If you want to convert this into a stream of session start events, then unfortunately ksqlDB doesn't yet allow you to directly change create a stream from the table, but you can create a stream over the table's change log:

-- Create a stream over the existing `SESSIONS` topic.
-- Note it states the window_type is 'Session'.
CREATE STREAM SESSION_STREAM (ROWKEY INT KEY, COUNT BIGINT) 
   WITH (kafka_topic='SESSIONS', value_format='JSON', window_type='Session');

-- Create a stream of window start events:
CREATE STREAM SESSION_STARTS AS 
    SELECT * FROM SESSION_STREAM 
    WHERE WINDOWSTART = WINDOWEND;

Note, with the upcoming 0.10 release you'll be able to name the key column in the SESSION_STREAM correctly:

CREATE STREAM SESSION_STREAM (USER_ID INT KEY, COUNT BIGINT) 
   WITH (kafka_topic='SESSIONS', value_format='JSON', window_type='Session');


来源:https://stackoverflow.com/questions/61961255/ksql-windowed-aggregation-stream

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!