spark.sql vs SqlContext

后端 未结 4 1995
盖世英雄少女心
盖世英雄少女心 2021-01-20 16:04

I have used SQL in Spark, in this example:

results = spark.sql(\"select * from ventas\")

where ventas is a dataframe, previosuly cataloged

4条回答
  •  傲寒
    傲寒 (楼主)
    2021-01-20 16:59

    From a user's perspective (not a contributor), I can only rehash what the developer's provided in the upgrade notes:

    Upgrading From Spark SQL 1.6 to 2.0

    • SparkSession is now the new entry point of Spark that replaces the old SQLContext and HiveContext. Note that the old SQLContext and HiveContext are kept for backward compatibility. A new catalog interface is accessible from SparkSession - existing API on databases and tables access such as listTables, createExternalTable, dropTempView, cacheTable are moved here.

    Before 2.0, the SqlContext needed an extra call to the factory that creates it. With SparkSession, they made things a lot more convenient.

    If you take a look at the source code, you'll notice that the SqlContext class is mostly marked @deprecated. Closer inspection shows that the most commonly used methods simply call sparkSession.

    For more info, take a look at the developer notes, Jira issues, conference talks on spark 2.0, and Databricks blog.

提交回复
热议问题