How to enable Tungsten optimization in Spark 2?

孤街醉人 提交于 2019-12-22 07:03:18

问题


I just built Spark 2 with hive support and deploy it to a cluster with Hortonworks 2.3.4. However I find that this Spark 2.0.3 is slower than the standard spark 1.5.3 that comes with HDP 2.3

When I check explain it seems that my Spark 2.0.3 is not using tungsten. Do I need to create special build to enable Tungsten?

Spark 1.5.3 Explain

== Physical Plan ==
TungstenAggregate(key=[id#2], functions=[], output=[id#2])
TungstenExchange hashpartitioning(id#2)
TungstenAggregate(key=[id#2], functions=[], output=[id#2])
HiveTableScan [id#2], (MetastoreRelation default, testing, None)

Spark 2.0.3

== Physical Plan ==
*HashAggregate(keys=[id#2481], functions=[])
  +- Exchange hashpartitioning(id#2481, 72)
  +- *HashAggregate(keys=[id#2481], functions=[])
  +- HiveTableScan [id#2481], MetastoreRelation default, testing

回答1:


It still uses Tungsten, class was renamed: https://github.com/apache/spark/commit/8900c8d8ff1614b5ec5a2ce213832fa13462b4d4




回答2:


The asterisk before the method indicates that WholeStageCodeGen was used for those tasks. This is Spark2's evolution of the original Tungsten-optimizations. If you see the asterisk, than that means, that Spark2's most optimized code is being used. If this runs (significantly) slower than previously, I would assume that there are configuration differences between your two test environments.




回答3:


I would think it's enabled by default but you can set spark.sql.tungsten.enabled=true



来源:https://stackoverflow.com/questions/43504744/how-to-enable-tungsten-optimization-in-spark-2

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!