Why does Complete output mode require aggregation?

前端 未结 2 1389
北荒
北荒 2021-02-13 05:42

I work with the latest Structured Streaming in Apache Spark 2.2 and got the following exception:

org.apache.spark.sql.AnalysisException: Complete output m

相关标签:
2条回答
  • 2021-02-13 06:15

    From the Structured Streaming Programming Guide - other queries (excluding aggregations, mapGroupsWithState and flatMapGroupsWithState):

    Complete mode not supported as it is infeasible to keep all unaggregated data in the Result Table.

    To answer the question:

    What would happen if Spark allowed Complete output mode with no aggregations in a streaming query?

    Probably OOM.

    The puzzling part is why dropDuplicates("id") is not marked as aggregation.

    0 讨论(0)
  • 2021-02-13 06:25

    I think the problem is the output mode. instead of using OutputMode.Complete, use OutputMode.Append as shown below.

    scala> val q = ids
        .writeStream
        .format("memory")
        .queryName("dups")
        .outputMode(OutputMode.Append)
        .trigger(Trigger.ProcessingTime(30.seconds))
        .option("checkpointLocation", "checkpoint-dir")
        .start
    
    0 讨论(0)
提交回复
热议问题