发表新帖

发表新帖

where does df.cache() is stored

后端未结

关注

 4  1839

清歌不尽 2021-02-03 15:35

I would like to understand in which node (driver or worker/executor) does below code is stored

df.cache() //df is a large dataframe (200GB)

And

4条回答

难免孤独 (楼主)

2021-02-03 16:21

Just adding my 25 cents. A SparkDF.cache() would load the data in executor memory. It will not load in driver memory. Which is what's desired. Here's a snapshot of 50% of data load post a df.cache().count() I just ran.

Cache() persists in memory and disk as delineated by koiralo, and is also lazy evaluated.

Cachedtable() stores on disk and is resilient to node failures for this reason.

Credit: https://forums.databricks.com/answers/63/view.html

0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...

热议问题