I work with the latest Structured Streaming in Apache Spark 2.2 and got the following exception:
org.apache.spark.sql.AnalysisException: Complete output m
From the Structured Streaming Programming Guide - other queries (excluding aggregations, mapGroupsWithState
and flatMapGroupsWithState
):
Complete mode not supported as it is infeasible to keep all unaggregated data in the Result Table.
To answer the question:
What would happen if Spark allowed Complete output mode with no aggregations in a streaming query?
Probably OOM.
The puzzling part is why dropDuplicates("id")
is not marked as aggregation.