apache-spark-1.5

Convert null values to empty array in Spark DataFrame

守給你的承諾、 提交于 2019-11-27 22:14:34
I have a Spark data frame where one column is an array of integers. The column is nullable because it is coming from a left outer join. I want to convert all null values to an empty array so I don't have to deal with nulls later. I thought I could do it like so: val myCol = df("myCol") df.withColumn( "myCol", when(myCol.isNull, Array[Int]()).otherwise(myCol) ) However, this results in the following exception: java.lang.RuntimeException: Unsupported literal type class [I [I@5ed25612 at org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:49) at org.apache.spark.sql.functions$

Passing additional jars to Spark via spark-submit

↘锁芯ラ 提交于 2019-11-27 16:25:31
I'm using Spark with MongoDB, and consequently rely on the mongo-hadoop drivers. I got things working thanks to input on my original question here . My Spark job is running, however, I receive warnings that I don't understand. When I run this command $SPARK_HOME/bin/spark-submit --driver-class-path /usr/local/share/mongo-hadoop/build/libs/mongo-hadoop-1.5.0-SNAPSHOT.jar:/usr/local/share/mongo-hadoop/spark/build/libs/mongo-hadoop-spark-1.5.0-SNAPSHOT.jar --jars /usr/local/share/mongo-hadoop/build/libs/mongo-hadoop-1.5.0-SNAPSHOT.jar:/usr/local/share/mongo-hadoop/spark/build/libs/mongo-hadoop

Save Spark Dataframe into Elasticsearch - Can’t handle type exception

家住魔仙堡 提交于 2019-11-27 05:12:55
I have designed a simple job to read data from MySQL and save it in Elasticsearch with Spark. Here is the code: JavaSparkContext sc = new JavaSparkContext( new SparkConf().setAppName("MySQLtoEs") .set("es.index.auto.create", "true") .set("es.nodes", "127.0.0.1:9200") .set("es.mapping.id", "id") .set("spark.serializer", KryoSerializer.class.getName())); SQLContext sqlContext = new SQLContext(sc); // Data source options Map<String, String> options = new HashMap<>(); options.put("driver", MYSQL_DRIVER); options.put("url", MYSQL_CONNECTION_URL); options.put("dbtable", "OFFERS"); options.put(

Convert null values to empty array in Spark DataFrame

痴心易碎 提交于 2019-11-26 20:56:06
问题 I have a Spark data frame where one column is an array of integers. The column is nullable because it is coming from a left outer join. I want to convert all null values to an empty array so I don't have to deal with nulls later. I thought I could do it like so: val myCol = df("myCol") df.withColumn( "myCol", when(myCol.isNull, Array[Int]()).otherwise(myCol) ) However, this results in the following exception: java.lang.RuntimeException: Unsupported literal type class [I [I@5ed25612 at org

Passing additional jars to Spark via spark-submit

混江龙づ霸主 提交于 2019-11-26 18:41:59
问题 I'm using Spark with MongoDB, and consequently rely on the mongo-hadoop drivers. I got things working thanks to input on my original question here. My Spark job is running, however, I receive warnings that I don't understand. When I run this command $SPARK_HOME/bin/spark-submit --driver-class-path /usr/local/share/mongo-hadoop/build/libs/mongo-hadoop-1.5.0-SNAPSHOT.jar:/usr/local/share/mongo-hadoop/spark/build/libs/mongo-hadoop-spark-1.5.0-SNAPSHOT.jar --jars /usr/local/share/mongo-hadoop