How to checkpoint DataFrames?

后端 未结 5 1565
一整个雨季
一整个雨季 2021-02-02 09:16

I\'m looking for a way to checkpoint DataFrames. Checkpoint is currently an operation on RDD but I can\'t find how to do it with DataFrames. persist and cache (which are synon

5条回答
  •  不知归路
    2021-02-02 09:31

    I think right now you'll have to do

    sc.setCheckpointDir("/DIR")
    df.rdd.checkpoint
    

    And then you will have to perform your action on the underlying df.rdd. Calling df.ACTION will not work currently, only df.rdd.ACTION

提交回复
热议问题