问题
I would like to update a hive table which is in orc format , I'm able to update from my ambari hive view, but unable to run same update statement from sacla (spark-shell)
objHiveContext.sql("select * from table_name ") able to see data but when I run
objHiveContext.sql("update table_name set column_name='testing' ") unable to run , some Noviable exception(Invalid syntax near update etc) is occurring where as I'm able to update from Ambari view(As I set all the required configurations i.e TBLPROPERTIES "orc.compress"="NONE" transactional true etc)
Tried with Insert into using case statements and all but couldn't Can we UPDATE hive ORC tables from spark? If yes then what is the procedure ?
Imported below
import org.apache.spark.SparkConf
import org.apache.spark.SparkConf
import org.apache.spark._
import org.apache.spark.sql._
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.hive.orc._
Note: I didn't apply any partition or bucketing on that table If I apply bucketing I'm even unable to view data when stored as ORC Hive Version:1.2.1 Spark version:1.4.1 Scala Version :2.10.6
回答1:
Have you tried the DataFrame.write API using SaveMode.Append per the link below?
http://spark.apache.org/docs/latest/sql-programming-guide.html#manually-specifying-options
use "orc" as the format and "append" as the save mode. examples are in that link above.
回答2:
Answer to sudhir question:-
How to mention DataBase Name while saving?
you can provide the database name before the table name. ex:- if your database name is orc_db and table name is yahoo_orc_table then you can mention the db name before the table name as below:-myData.write.format("orc").mode(SaveMode.Append).saveAsTable("orc_db.yahoo_orc_table")
来源:https://stackoverflow.com/questions/34534610/how-to-updata-an-orc-hive-table-form-spark-using-scala