In a recent SO-post, I discovered that using withColumn may improve the DAG when dealing with stacked/chain column expressions in conjunction
withColumn
This looks like a consequence of the the internal projection caused by withColumn. It's documented here in the Spark docs
The official recommendation is to do as Jay recommended and instead do a select when dealing with multiple columns