Spark DAG differs with 'withColumn' vs 'select'

前端 未结 2 1128
小鲜肉
小鲜肉 2021-01-05 17:16

Context

In a recent SO-post, I discovered that using withColumn may improve the DAG when dealing with stacked/chain column expressions in conjunction

2条回答
  •  天涯浪人
    2021-01-05 17:52

    This looks like a consequence of the the internal projection caused by withColumn. It's documented here in the Spark docs

    The official recommendation is to do as Jay recommended and instead do a select when dealing with multiple columns

自定义标题
段落格式
字体
字号
代码语言
提交回复
热议问题