How to Updata an ORC Hive table form Spark using Scala

感情迁移 提交于 2019-12-23 12:38:34


I would like to update a hive table which is in orc format , I'm able to update from my ambari hive view, but unable to run same update statement from sacla (spark-shell)

objHiveContext.sql("select * from table_name ") able to see data but when I run

objHiveContext.sql("update table_name set column_name='testing' ") unable to run , some Noviable exception(Invalid syntax near update etc) is occurring where as I'm able to update from Ambari view(As I set all the required configurations i.e TBLPROPERTIES "orc.compress"="NONE" transactional true etc)

Tried with Insert into using case statements and all but couldn't Can we UPDATE hive ORC tables from spark? If yes then what is the procedure ?

Imported below

import org.apache.spark.SparkConf
import org.apache.spark.SparkConf
import org.apache.spark._
import org.apache.spark.sql._
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.hive.orc._

Note: I didn't apply any partition or bucketing on that table If I apply bucketing I'm even unable to view data when stored as ORC Hive Version:1.2.1 Spark version:1.4.1 Scala Version :2.10.6


Have you tried the DataFrame.write API using SaveMode.Append per the link below?

use "orc" as the format and "append" as the save mode. examples are in that link above.


Answer to sudhir question:-

How to mention DataBase Name while saving?

you can provide the database name before the table name. ex:- if your database name is orc_db and table name is yahoo_orc_table then you can mention the db name before the table name as below:-myData.write.format("orc").mode(SaveMode.Append).saveAsTable("orc_db.yahoo_orc_table")

