How to Updata an ORC Hive table form Spark using Scala

问题

I would like to update a hive table which is in orc format , I'm able to update from my ambari hive view, but unable to run same update statement from sacla (spark-shell)

objHiveContext.sql("select * from table_name ") able to see data but when I run

objHiveContext.sql("update table_name set column_name='testing' ") unable to run , some Noviable exception(Invalid syntax near update etc) is occurring where as I'm able to update from Ambari view(As I set all the required configurations i.e TBLPROPERTIES "orc.compress"="NONE" transactional true etc)

Tried with Insert into using case statements and all but couldn't Can we UPDATE hive ORC tables from spark? If yes then what is the procedure ?

Imported below

import org.apache.spark.SparkConf
import org.apache.spark.SparkConf
import org.apache.spark._
import org.apache.spark.sql._
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.hive.orc._

Note: I didn't apply any partition or bucketing on that table If I apply bucketing I'm even unable to view data when stored as ORC Hive Version:1.2.1 Spark version:1.4.1 Scala Version :2.10.6

回答1:

Have you tried the DataFrame.write API using SaveMode.Append per the link below?

http://spark.apache.org/docs/latest/sql-programming-guide.html#manually-specifying-options

use "orc" as the format and "append" as the save mode. examples are in that link above.

回答2:

Answer to sudhir question:-

How to mention DataBase Name while saving?

you can provide the database name before the table name. ex:- if your database name is orc_db and table name is yahoo_orc_table then you can mention the db name before the table name as below:-myData.write.format("orc").mode(SaveMode.Append).saveAsTable("orc_db.yahoo_orc_table")

来源：https://stackoverflow.com/questions/34534610/how-to-updata-an-orc-hive-table-form-spark-using-scala

标签

scala

apache-spark

apache-spark-sql

hiveql

hivecontext