We have a hive warehouse, and wanted to use spark for various tasks (mainly classification). At times write the results back as a hive table. For example, we wrote the following
What version of spark you are using ?
This answer is based on 1.6 & using the data frames.
val sc = new SparkContext(conf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
val client = Seq((1, "A", 10), (2, "A", 5), (3, "B", 56)).toDF("ID", "Categ", "Amnt")
import org.apache.spark.sql.functions._
client.groupBy("Categ").agg(sum("Amnt").as("Sum"), count("ID").as("count")).show()
+-----+---+-----+
|Categ|Sum|count|
+-----+---+-----+
| A| 15| 2|
| B| 56| 1|
+-----+---+-----+
Hope this helps !!