I run an action two times, and the second time takes very little time to run, so I suspect that spark automatically cache some results. But I did find any source.
Im
From the documentation:
Spark also automatically persists some intermediate data in shuffle operations (e.g. reduceByKey), even without users calling persist. This is done to avoid recomputing the entire input if a node fails during the shuffle. We still recommend users call persist on the resulting RDD if they plan to reuse it.
The underlying filesystem will also be caching access to the disk.