I\'m trying JSON-SerDe from below link http://code.google.com/p/hive-json-serde/wiki/GettingStarted.
CREATE TABLE my_table (field1 string, field2
I solved similar problem -
I took the jar from - [http://www.congiu.net/hive-json-serde/1.3.8/hdp23/json-serde-1.3.8-jar-with-dependencies.jar]
Run the command in Hive CLI - add jar /path/to/jar
create table messages (
id int,
creation_date string,
text string,
loggedInUser STRUCT<id:INT, name: STRING>
)
row format serde "org.openx.data.jsonserde.JsonSerDe";
{"id": 1,"creation_date": "2020-03-01","text": "I am on cotroller","loggedInUser":{"id":1,"name":"API"}}
{"id": 2,"creation_date": "2020-04-01","text": "I am on service","loggedInUser":{"id":1,"name":"API"}}
LOAD DATA LOCAL INPATH '${env:HOME}/path-to-json'
OVERWRITE INTO TABLE messages;
select * from messages;
1 2020-03-01 I am on cotroller {"id":1,"name:"API"}
2 2020-04-01 I am on service {"id":1,"name:"API"}
for json parsing based on cwiki/confluence we need follow some steps
need to download hive-hcatalog-core.jar
hive> add jar /path/hive-hcatalog-core.jar
create table tablename(colname1 datatype,.....) row formatserde'org.apache.hive.hcatalog.data.JsonSerDe' stored as ORCFILE;
colname in creating table and colname in test.json must be same if not it will show null values Hope it wil helpfull
First of all you have to validate your json file on http://jsonlint.com/ after that make your file as one row per line and remove the [ ]. the comma at the end of the line is mandatory.
[{"field1":"data1","field2":100,"field3":"more data1","field4":123.001}, {"field1":"data2","field2":200,"field3":"more data2","field4":123.002}, {"field1":"data3","field2":300,"field3":"more data3","field4":123.003}, {"field1":"data4","field2":400,"field3":"more data4","field4":123.004}]
In my test I added hive-json-serde-0.2.jar from hadoop cluster , I think hive-json-serde-0.1.jar should be ok.
ADD JAR hive-json-serde-0.2.jar;
Create your table
CREATE TABLE my_table (field1 string, field2 int, field3 string, field4 double) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde' ;
Load your Json data file ,here I load it from hadoop cluster not from local
LOAD DATA INPATH 'Test2.json' INTO TABLE my_table;
My test
A bit hard to tell what's going on without the logs (see Getting Started) in case of doubt. Just a quick thought - can you try if it works with WITH SERDEPROPERTIES
as so:
CREATE EXTERNAL TABLE my_table (field1 string, field2 int,
field3 string, field4 double)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde'
WITH SERDEPROPERTIES (
"field1"="$.field1",
"field2"="$.field2",
"field3"="$.field3",
"field4"="$.field4"
);
There is also a fork you might want to give a try from ThinkBigAnalytics.
UPDATE: Turns out the input in Test.json is invalid JSON hence the records get collapsed.
See answer https://stackoverflow.com/a/11707993/396567 for further details.