apache-spark-1.6

Why does importing SparkSession in spark-shell fail with “object SparkSession is not a member of package org.apache.spark.sql”?

送分小仙女□ 提交于 2019-12-10 23:49:43
问题 I use Spark 1.6.0 on my VM, Cloudera machine. I'm trying to enter some data into Hive table from Spark shell. To do that, I am trying to use SparkSession. But the below import is not working. scala> import org.apache.spark.sql.SparkSession <console>:33: error: object SparkSession is not a member of package org.apache.spark.sql import org.apache.spark.sql.SparkSession And without that, I cannot execute this statement: val spark = SparkSession.builder.master("local[2]").enableHiveSupport()

Spark Streaming application fails with KafkaException: String exceeds the maximum size or with IllegalArgumentException

一世执手 提交于 2019-12-10 19:46:15
问题 TL;DR: My very simple Spark Streaming application fails in the driver with the "KafkaException: String exceeds the maximum size". I see the same exception in the executor but I also found somewhere down the executor's logs an IllegalArgumentException with no other information in it Full problem: I'm using Spark Streaming to read some messages from a Kafka topic. This is what I'm doing: val conf = new SparkConf().setAppName("testName") val streamingContext = new StreamingContext(new

udf No TypeTag available for type string

不问归期 提交于 2019-12-07 21:13:33
问题 I don't understand a behavior of spark. I create an udf wich return an Integer like below import org.apache.spark.sql.SQLContext import org.apache.spark.{SparkConf, SparkContext} object Show { def main(args: Array[String]): Unit = { val (sc,sqlContext) = iniSparkConf("test") val testInt_udf = sqlContext.udf.register("testInt_udf", testInt _) } def iniSparkConf(appName: String): (SparkContext, SQLContext) = { val conf = new SparkConf().setAppName(appName)//.setExecutorEnv("spark.ui.port",

udf No TypeTag available for type string

匆匆过客 提交于 2019-12-06 14:20:59
I don't understand a behavior of spark. I create an udf wich return an Integer like below import org.apache.spark.sql.SQLContext import org.apache.spark.{SparkConf, SparkContext} object Show { def main(args: Array[String]): Unit = { val (sc,sqlContext) = iniSparkConf("test") val testInt_udf = sqlContext.udf.register("testInt_udf", testInt _) } def iniSparkConf(appName: String): (SparkContext, SQLContext) = { val conf = new SparkConf().setAppName(appName)//.setExecutorEnv("spark.ui.port", "4046") val sc = new SparkContext(conf) sc.setLogLevel("WARN") val sqlContext = new SQLContext(sc) (sc,

Why Spark application on YARN fails with FetchFailedException due to Connection refused?

允我心安 提交于 2019-12-06 05:05:43
问题 I am using spark version 1.6.3 and yarn version 2.7.1.2.3 comes with HDP-2.3.0.0-2557 . Becuase, spark version is too old in the HDP version that I use, I prefer to use another spark as yarn mode remotely. Here is how I run spark shell; ./spark-shell --master yarn-client Everything seem fine, sparkContext is initialized, sqlContext is initialized. I can even access my hive tables. But in some cases, it is getting in trouble when it tries to connect to block managers. I am not an expert but I

PySpark serialization EOFError

笑着哭i 提交于 2019-11-30 17:01:06
I am reading in a CSV as a Spark DataFrame and performing machine learning operations upon it. I keep getting a Python serialization EOFError - any idea why? I thought it might be a memory issue - i.e. file exceeding available RAM - but drastically reducing the size of the DataFrame didn't prevent the EOF error. Toy code and error below. #set spark context conf = SparkConf().setMaster("local").setAppName("MyApp") sc = SparkContext(conf = conf) sqlContext = SQLContext(sc) #read in 500mb csv as DataFrame df = sqlContext.read.format('com.databricks.spark.csv').options(header='true', inferschema=

How to use collect_set and collect_list functions in windowed aggregation in Spark 1.6?

╄→尐↘猪︶ㄣ 提交于 2019-11-30 12:48:46
问题 In Spark 1.6.0 / Scala, is there an opportunity to get collect_list("colC") or collect_set("colC").over(Window.partitionBy("colA").orderBy("colB") ? 回答1: Given that you have dataframe as +----+----+----+ |colA|colB|colC| +----+----+----+ |1 |1 |23 | |1 |2 |63 | |1 |3 |31 | |2 |1 |32 | |2 |2 |56 | +----+----+----+ You can Window functions by doing the following import org.apache.spark.sql.functions._ import org.apache.spark.sql.expressions._ df.withColumn("colD", collect_list("colC").over

How to use collect_set and collect_list functions in windowed aggregation in Spark 1.6?

陌路散爱 提交于 2019-11-30 03:52:02
In Spark 1.6.0 / Scala, is there an opportunity to get collect_list("colC") or collect_set("colC").over(Window.partitionBy("colA").orderBy("colB") ? Given that you have dataframe as +----+----+----+ |colA|colB|colC| +----+----+----+ |1 |1 |23 | |1 |2 |63 | |1 |3 |31 | |2 |1 |32 | |2 |2 |56 | +----+----+----+ You can Window functions by doing the following import org.apache.spark.sql.functions._ import org.apache.spark.sql.expressions._ df.withColumn("colD", collect_list("colC").over(Window.partitionBy("colA").orderBy("colB"))).show(false) Result: +----+----+----+------------+ |colA|colB|colC

PySpark serialization EOFError

浪子不回头ぞ 提交于 2019-11-30 00:16:04
问题 I am reading in a CSV as a Spark DataFrame and performing machine learning operations upon it. I keep getting a Python serialization EOFError - any idea why? I thought it might be a memory issue - i.e. file exceeding available RAM - but drastically reducing the size of the DataFrame didn't prevent the EOF error. Toy code and error below. #set spark context conf = SparkConf().setMaster("local").setAppName("MyApp") sc = SparkContext(conf = conf) sqlContext = SQLContext(sc) #read in 500mb csv as

udf No TypeTag available for type string

对着背影说爱祢 提交于 2019-11-29 17:13:22
I don't understand a behavior of spark. I create an udf wich return an Integer like below import org.apache.spark.sql.SQLContext import org.apache.spark.{SparkConf, SparkContext} object Show { def main(args: Array[String]): Unit = { val (sc,sqlContext) = iniSparkConf("test") val testInt_udf = sqlContext.udf.register("testInt_udf", testInt _) } def iniSparkConf(appName: String): (SparkContext, SQLContext) = { val conf = new SparkConf().setAppName(appName)//.setExecutorEnv("spark.ui.port", "4046") val sc = new SparkContext(conf) sc.setLogLevel("WARN") val sqlContext = new SQLContext(sc) (sc,