问题
Case: I have a table HiveTest which is a ORC table and transaction set true and loaded in spark shell and viewed data
var rdd= objHiveContext.sql("select * from HiveTest")
rdd.show()
--- Able to view data
Now I went to my hive shell or ambari updated the table , example
hive> update HiveTest set name='test' ---Done and success
hive> select * from HiveTest -- able to view updated data
Now when I can come back to spark and run I cannot view any data except column names
scala>var rdd1= objHiveContext.sql("select * from HiveTest")
scala> rdd1.show()
--This time only columns are printed , data is not coming
Issue 2: Unable to update from spark sql when I run scal>objHiveContext.sql("update HiveTest set name='test'") getting below error
org.apache.spark.sql.AnalysisException:
Unsupported language features in query: INSERT INTO HiveTest values(1,'sudhir','Software',1,'IT')
TOK_QUERY 0, 0,17, 0
TOK_FROM 0, -1,17, 0
TOK_VIRTUAL_TABLE 0, -1,17, 0
TOK_VIRTUAL_TABREF 0, -1,-1, 0
TOK_ANONYMOUS 0, -1,-1, 0
TOK_VALUES_TABLE 1, 6,17, 28
TOK_VALUE_ROW 1, 7,17, 28
1 1, 8,8, 28
'sudhir' 1, 10,10, 30
'Software' 1, 12,12, 39
1 1, 14,14, 50
'IT' 1, 16,16, 52
TOK_INSERT 1, 0,-1, 12
TOK_INSERT_INTO 1, 0,4, 12
TOK_TAB 1, 4,4, 12
TOK_TABNAME 1, 4,4, 12
HiveTest 1, 4,4, 12
TOK_SELECT 0, -1,-1, 0
TOK_SELEXPR 0, -1,-1, 0
TOK_ALLCOLREF 0, -1,-1, 0
scala.NotImplementedError: No parse rules for:
TOK_VIRTUAL_TABLE 0, -1,17, 0
TOK_VIRTUAL_TABREF 0, -1,-1, 0
TOK_ANONYMOUS 0, -1,-1, 0
TOK_VALUES_TABLE 1, 6,17, 28
TOK_VALUE_ROW 1, 7,17, 28
1 1, 8,8, 28
'sudhir' 1, 10,10, 30
'Software' 1, 12,12, 39
1 1, 14,14, 50
'IT' 1, 16,16, 52
org.apache.spark.sql.hive.HiveQl$.nodeToRelation(HiveQl.scala:1235)
This error is for Insert into statement same sort of error for update statement also.
回答1:
Have you tried objHiveContext.refreshTable("HiveTest")?
Spark SQL aggressively caches Hive metastore data.
If an update happens outside of Spark SQL, you might experience some unexpected results as Spark SQL's version of the Hive metastore is out of date.
Here's some more info:
http://spark.apache.org/docs/latest/sql-programming-guide.html#metadata-refreshing
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.hive.HiveContext
The docs mostly mention Parquet, but this likely applies to ORC and other file formats.
With JSON, for example, if you add new files into a directory outside of Spark SQL, you'll need to call hiveContext.refreshTable() within Spark SQL to see the new data.
回答2:
sparksql does not have the update and delete transactions enabled uptil now. however insert still can be done.
来源:https://stackoverflow.com/questions/34661547/unable-to-view-data-of-hive-tables-after-update-in-spark