Take word count for example, when the application startup and long runs, and receive a word \"Spark\", then in the result table, there is a row (Spark,1),
To avoid this huge amount of data in memory spark structured streaming uses watermarks. The main idea is to store in memory only data within specific time window. All the data outside this window are stored in file system. You can read about watermarks here or here