using JSON-SerDe in Hive tables

后端 未结 4 1288
被撕碎了的回忆
被撕碎了的回忆 2021-01-02 21:57

I\'m trying JSON-SerDe from below link http://code.google.com/p/hive-json-serde/wiki/GettingStarted.

         CREATE TABLE my_table (field1 string, field2          


        
相关标签:
4条回答
  • 2021-01-02 22:05

    I solved similar problem -

    1. I took the jar from - [http://www.congiu.net/hive-json-serde/1.3.8/hdp23/json-serde-1.3.8-jar-with-dependencies.jar]

    2. Run the command in Hive CLI - add jar /path/to/jar

    3. Created table using -
    create table messages (
        id int,
        creation_date string,
        text string,
        loggedInUser STRUCT<id:INT, name: STRING>
    )
    row format serde "org.openx.data.jsonserde.JsonSerDe";
    
    1. This is my JSON data -
    {"id": 1,"creation_date": "2020-03-01","text": "I am on cotroller","loggedInUser":{"id":1,"name":"API"}}
    {"id": 2,"creation_date": "2020-04-01","text": "I am on service","loggedInUser":{"id":1,"name":"API"}}
    
    1. Loaded data in table using -
    LOAD DATA LOCAL INPATH '${env:HOME}/path-to-json'
    OVERWRITE INTO TABLE messages;
    
    1. select * from messages;
    1   2020-03-01    I am on cotroller   {"id":1,"name:"API"}
    2   2020-04-01    I am on service     {"id":1,"name:"API"}
    
    0 讨论(0)
  • 2021-01-02 22:16

    for json parsing based on cwiki/confluence we need follow some steps

    1. need to download hive-hcatalog-core.jar

    2. hive> add jar /path/hive-hcatalog-core.jar

    3. create table tablename(colname1 datatype,.....) row formatserde'org.apache.hive.hcatalog.data.JsonSerDe' stored as ORCFILE;

    4. colname in creating table and colname in test.json must be same if not it will show null values Hope it wil helpfull

    0 讨论(0)
  • 2021-01-02 22:21
    1. First of all you have to validate your json file on http://jsonlint.com/ after that make your file as one row per line and remove the [ ]. the comma at the end of the line is mandatory.

      [{"field1":"data1","field2":100,"field3":"more data1","field4":123.001}, {"field1":"data2","field2":200,"field3":"more data2","field4":123.002}, {"field1":"data3","field2":300,"field3":"more data3","field4":123.003}, {"field1":"data4","field2":400,"field3":"more data4","field4":123.004}]

    2. In my test I added hive-json-serde-0.2.jar from hadoop cluster , I think hive-json-serde-0.1.jar should be ok.

      ADD JAR hive-json-serde-0.2.jar;

    3. Create your table

      CREATE TABLE my_table (field1 string, field2 int, field3 string, field4 double) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde' ;

    4. Load your Json data file ,here I load it from hadoop cluster not from local

      LOAD DATA INPATH 'Test2.json' INTO TABLE my_table;

    My test

    0 讨论(0)
  • 2021-01-02 22:27

    A bit hard to tell what's going on without the logs (see Getting Started) in case of doubt. Just a quick thought - can you try if it works with WITH SERDEPROPERTIESas so:

    CREATE EXTERNAL TABLE my_table (field1 string, field2 int, 
                                    field3 string, field4 double)
    ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde'
    WITH SERDEPROPERTIES (
      "field1"="$.field1",
      "field2"="$.field2",
      "field3"="$.field3",
      "field4"="$.field4" 
    );
    

    There is also a fork you might want to give a try from ThinkBigAnalytics.

    UPDATE: Turns out the input in Test.json is invalid JSON hence the records get collapsed.

    See answer https://stackoverflow.com/a/11707993/396567 for further details.

    0 讨论(0)
提交回复
热议问题