I\'m looking for a way to checkpoint DataFrames. Checkpoint is currently an operation on RDD but I can\'t find how to do it with DataFrames. persist and cache (which are synon
The original question is about Scala Spark, but I think it is useful to add the PySpark syntax as well, which is very similar. Note that unlike cache/persist, checkpoint does not operate in-place (this tripped me up initially):