Why does Complete output mode require aggregation?

前端未结

关注

 2  1389

北荒

I work with the latest Structured Streaming in Apache Spark 2.2 and got the following exception:

org.apache.spark.sql.AnalysisException: Complete output m

相关标签:

2条回答

难免孤独

2021-02-13 06:15

From the Structured Streaming Programming Guide - other queries (excluding aggregations, mapGroupsWithState and flatMapGroupsWithState):

Complete mode not supported as it is infeasible to keep all unaggregated data in the Result Table.

To answer the question:

What would happen if Spark allowed Complete output mode with no aggregations in a streaming query?

Probably OOM.

The puzzling part is why dropDuplicates("id") is not marked as aggregation.

0 讨论(0)
发布评论:

提交评论
- 加载中...

粉色の甜心

2021-02-13 06:25

I think the problem is the output mode. instead of using OutputMode.Complete, use OutputMode.Append as shown below.

scala> val q = ids
    .writeStream
    .format("memory")
    .queryName("dups")
    .outputMode(OutputMode.Append)
    .trigger(Trigger.ProcessingTime(30.seconds))
    .option("checkpointLocation", "checkpoint-dir")
    .start

0 讨论(0)