问题
- Distinct values are cached with every streamed batch of data.
- How do i build the cache by adding the next distinct values in the next batch to the already cached RDD?
回答1:
You can not directly append your data with Rdd because its immutable. Using union to create new Rdd and then cache it.
来源:https://stackoverflow.com/questions/34077905/spark-how-to-append-to-cached-rdd