Hive: parsing JSON

前端 未结 4 2039
遥遥无期
遥遥无期 2021-02-04 04:11

I am trying to get some values out of nested JSON for millions of rows (5 TB+ table). What is the most efficient way to do this?

Here is an example:

{\"c         


        
4条回答
  •  执念已碎
    2021-02-04 04:41

    Here is what you can quickly try , I would suggest to use Json-Ser-De.

    nano /tmp/hive-parsing-json.json

    {"country":"US","page":227,"data":{"ad":{"impressions":{"s":10,"o":10}}}}
    

    Create base table :

    hive > CREATE TABLE hive_parsing_json_table ( json string );
    

    Load json file to Table :

    hive > LOAD DATA LOCAL INPATH  '/tmp/hive-parsing-json.json' INTO TABLE hive_parsing_json_table;
    

    Query the table :

    hive >  select v1.Country, v1.Page, v4.impressions_s, v4.impressions_o 
    from hive_parsing_json_table hpjp
         LATERAL VIEW json_tuple(hpjp.json, 'country', 'page', 'data') v1
         as Country, Page, data
         LATERAL VIEW json_tuple(v1.data, 'ad') v2
         as Ad
         LATERAL VIEW json_tuple(v2.Ad, 'impressions') v3
         as Impressions
         LATERAL VIEW json_tuple(v3.Impressions, 's' , 'o') v4
         as impressions_s,impressions_o;  
    

    Output :

    v1.country  v1.page     v4.impressions_s    v4.impressions_o
    US      227     10          10
    

提交回复
热议问题