where does df.cache() is stored

后端 未结 4 1839
清歌不尽
清歌不尽 2021-02-03 15:35

I would like to understand in which node (driver or worker/executor) does below code is stored

df.cache() //df is a large dataframe (200GB)

And

4条回答
  •  难免孤独
    2021-02-03 16:21

    Just adding my 25 cents. A SparkDF.cache() would load the data in executor memory. It will not load in driver memory. Which is what's desired. Here's a snapshot of 50% of data load post a df.cache().count() I just ran.

    Cache() persists in memory and disk as delineated by koiralo, and is also lazy evaluated.

    Cachedtable() stores on disk and is resilient to node failures for this reason.

    Credit: https://forums.databricks.com/answers/63/view.html

提交回复
热议问题