I have a data frame and I want to roll up the data into 7days and do some aggregation on some of the function.
I have a pyspark sql dataframe like ------
<
The error kind of says everything :
py4j.protocol.Py4JJavaError: An error occurred while calling o138.select.
: org.apache.spark.sql.AnalysisException: Could not resolve window function 'min'. Note that, using window functions currently requires a HiveContext;
You'll need a version of spark that supports hive (build with hive) than you can declare a hivecontext :
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
and then use that context to perform your window function.
In python :
# sc is an existing SparkContext.
from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)
You can read further about the difference between SQLContext
and HiveContext
here.
SparkSQL has a SQLContext and a HiveContext. HiveContext is a super set of the SQLContext. The Spark community suggest using the HiveContext. You can see that when you run spark-shell, which is your interactive driver application, it automatically creates a SparkContext defined as sc and a HiveContext defined as sqlContext. The HiveContext allows you to execute SQL queries as well as Hive commands. The same behavior occurs for pyspark.