Why does Complete output mode require aggregation?

前端 未结 2 1399
北荒
北荒 2021-02-13 05:42

I work with the latest Structured Streaming in Apache Spark 2.2 and got the following exception:

org.apache.spark.sql.AnalysisException: Complete output m

2条回答
  •  难免孤独
    2021-02-13 06:15

    From the Structured Streaming Programming Guide - other queries (excluding aggregations, mapGroupsWithState and flatMapGroupsWithState):

    Complete mode not supported as it is infeasible to keep all unaggregated data in the Result Table.

    To answer the question:

    What would happen if Spark allowed Complete output mode with no aggregations in a streaming query?

    Probably OOM.

    The puzzling part is why dropDuplicates("id") is not marked as aggregation.

提交回复
热议问题