I am trying to use rowNumber in Spark data frames. My queries are working as expected in Spark shell. But when i write them out in eclipse and compile a jar, i am facing an
For Spark 2.0, it is recommended to use SparkSession
as the single entry point. It eliminates the HiveContext
/SqlContext
confusion issue.
import org.apache.spark.sql.SparkSession
val session = SparkSession.builder
.master("local")
.appName("application name")
.getOrCreate()
Check out this databricks article for how to use it.
I have already answered a similar question before. The error message says all. With spark < version 2.x, you'll need a HiveContext
in your application jar, no other way around.
You can read further about the difference between SQLContextand HiveContext here.
SparkSQL
has a SQLContext
and a HiveContext
. HiveContext
is a super set of the SQLContext
. The Spark community suggest using the HiveContext
. You can see that when you run spark-shell, which is your interactive driver application, it automatically creates a SparkContext
defined as sc and a HiveContext
defined as sqlContext
. The HiveContext
allows you to execute SQL queries as well as Hive commands.
You can try to check that inside of your spark-shell
:
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.6.0
/_/
Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_74)
scala> sqlContext.isInstanceOf[org.apache.spark.sql.hive.HiveContext]
res0: Boolean = true
scala> sqlContext.isInstanceOf[org.apache.spark.sql.SQLContext]
res1: Boolean = true
scala> sqlContext.getClass.getName
res2: String = org.apache.spark.sql.hive.HiveContext
By inheritance, HiveContext
is actually an SQLContext
, but it's not true the other way around. You can check the source code if you are more intersted in knowing how does HiveContext
inherits from SQLContext
.
Since spark 2.0, you'll just need to create a SparkSession
(as the single entry point) which removes the HiveContext
/SQLContext
confusion issue.