I\'m looking for a way to checkpoint DataFrames. Checkpoint is currently an operation on RDD but I can\'t find how to do it with DataFrames. persist and cache (which are synon
I think right now you'll have to do
sc.setCheckpointDir("/DIR") df.rdd.checkpoint
And then you will have to perform your action on the underlying df.rdd. Calling df.ACTION will not work currently, only df.rdd.ACTION
df.rdd
df.ACTION
df.rdd.ACTION