I am new spark streaming. I understood window size needs to be a multiple of the batch interval. But how does the sliding interval work? If i have 3 as window size and 2 as slid
Here is a link to a documentation.
Let's walk through these concepts:
You can refer to image above where window size is 3 times of batch interval and sliding window is 2 times of batch interval.
To answer a question why window and sliding intervals shall be multiple of batch interval - it is because otherwise your window will end inbetween batch.
If you have 3 as window size and 2 as sliding interval (see image) - yes, your word count will overlap. Basically you use window when you want to calculate something for some limited time - like actual news or tweets or whatever, when you don't need all historical data for the analysis.