Hive: parsing JSON

前端 未结 4 2043
遥遥无期
遥遥无期 2021-02-04 04:11

I am trying to get some values out of nested JSON for millions of rows (5 TB+ table). What is the most efficient way to do this?

Here is an example:

{\"c         


        
4条回答
  •  清歌不尽
    2021-02-04 04:40

    Using hive native json-serde('org.apache.hive.hcatalog.data.JsonSerDe') you can do this.. here are the steps

    ADD JAR /path/to/hive-hcatalog-core.jar;

    create a table as below 
     CREATE TABLE json_serde_nestedjson (
      country string,
      page int,
      data struct < ad: struct < impressions: struct < s:int, o:int  > > >
    )
    ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe';
    

    then load data(stored in file)

    LOAD DATA LOCAL INPATH '/tmp/nested.json' INTO TABLE json_serde_nestedjson;
    

    then get required data using

    SELECT country, page, data.ad.impressions.s, data.ad.impressions.o 
    FROM json_serde_nestedjson;  
    

提交回复
热议问题