How to checkpoint DataFrames?

后端 未结 5 1563
一整个雨季
一整个雨季 2021-02-02 09:16

I\'m looking for a way to checkpoint DataFrames. Checkpoint is currently an operation on RDD but I can\'t find how to do it with DataFrames. persist and cache (which are synon

5条回答
  •  清酒与你
    2021-02-02 09:43

    The original question is about Scala Spark, but I think it is useful to add the PySpark syntax as well, which is very similar. Note that unlike cache/persist, checkpoint does not operate in-place (this tripped me up initially):

    spark.sparkContext.setCheckpointDir("/foo/bar")
    df = df.checkpoint()
    

提交回复
热议问题