Whole-Stage Code Generation in Spark 2.0

前端 未结 1 2119
伪装坚强ぢ
伪装坚强ぢ 2021-02-15 15:17

I heard about Whole-Stage Code Generation for sql to optimize queries. through p539-neumann.pdf & sparksql-sql-codegen-is-not-giving-any-improvemnt

But

1条回答
  •  慢半拍i
    慢半拍i (楼主)
    2021-02-15 15:44

    When you are using Spark 2.0, code generation is enabled by default. This allows for most DataFrame queries you are able to take advantage of the performance improvements. There are some potential exceptions such as using Python UDFs that may slow things down.

    Code generation is one of the primary components of the Spark SQL engine's Catalyst Optimizer. In brief, the Catalyst Optimizer engine does the following: (1) analyzing a logical plan to resolve references, (2) logical plan optimization (3) physical planning, and (4) code generation

    A great reference to all of this are the blog posts

    • Deep Dive into Spark SQL’s Catalyst Optimizer

    • Apache Spark as a Compiler: Joining a Billion Rows per Second on a Laptop

    HTH!

    0 讨论(0)
提交回复
热议问题