Parse json arrays using HIVE

前端 未结 3 1086
野趣味
野趣味 2020-12-05 08:04

I have many json arrays stored in a table (jt) that looks like this:

[{\"ts\":1403781896,\"id\":14,\"log\":\"show\"},{\"ts\":1403781896,\"id\":14,\"log\":\"         


        
相关标签:
3条回答
  • 2020-12-05 08:46

    because get_json_object doesn't support json array string, so you can concat to a json object, like this:

    SELECT 
        get_json_object(concat(concat('{"root":', jt.value), '}'), '$.root')
    FROM jt;
    
    0 讨论(0)
  • 2020-12-05 08:54

    Use explode() function

     hive (default)> CREATE TABLE logs AS
                      >   SELECT get_json_object(single_json_table.single_json, '$.ts') AS ts,
                      >   get_json_object(single_json_table.single_json, '$.id') AS id,
                      >   get_json_object(single_json_table.single_json, '$.log') AS log
                      >   FROM
                      >     (SELECT explode(json_array_col) as single_json FROM jt) single_json_table ;
    
    Automatically selecting local only mode for query
    Total MapReduce jobs = 3
    Launching Job 1 out of 3
    Number of reduce tasks is set to 0 since there's no reduce operator
    
    hive (default)> select * from logs;
    OK
    ts      id      log
    1403781896      14      show
    1403781896      14      start
    1403781911      14      press
    1403781911      14      press
    Time taken: 0.118 seconds, Fetched: 4 row(s)
    hive (default)>
    

    where json_array_col is column in jt which holds your array of jsons.

    hive (default)> select json_array_col from jt;
    json_array_col
    ["{"ts":1403781896,"id":14,"log":"show"}","{"ts":1403781896,"id":14,"log":"start"}"]
    ["{"ts":1403781911,"id":14,"log":"press"}","{"ts":1403781911,"id":14,"log":"press"}"]
    
    0 讨论(0)
  • 2020-12-05 08:55

    I just ran into this problem, with the JSON array stored as a string in the hive table.

    The solution is a bit hacky and ugly, but it works and doesn't require serdes or external UDFs

    SELECT 
           get_json_object(single_json_table.single_json, '$.ts') AS ts,
           get_json_object(single_json_table.single_json, '$.id') AS id,
           get_json_object(single_json_table.single_json, '$.log') AS log
    FROM ( SELECT explode (
      split(regexp_replace(substr(json_array_col, 2, length(json_array_col)-2),
                '"}","', '"}",,,,"'), ',,,,')
          ) FROM src_table) single_json_table;
    

    I broke the lines up so that it would be a little easier to read. I'm using substr() to strip the first and last characters, removing [ and ] . I'm then using regex_replace to match the separator between records in the json array and adding or changing the separator to be something unique that can then be used easily with split() to turn the string into a hive array of json objects which can then be used with explode() as described in the previous solution.

    Note, the separator regex used here ( "}"," ) wouldn't work with the original data set...the regex would have to be ( "},\{" ) and the replacement would then need to be "},,,,{" eg..

      split(regexp_replace(substr(json_array_col, 2, length(json_array_col)-2),
                '"},\\{"', '"},,,,{"'), ',,,,')
    
    0 讨论(0)
提交回复
热议问题