I am trying to get some values out of nested JSON for millions of rows (5 TB+ table). What is the most efficient way to do this?
Here is an example:
You can use get_json_object:
select get_json_object(fieldname, '$.country'),
get_json_object(fieldname, '$.data.ad.s') from ...
You will get better performance with json_tuple but I found a "how to" to get the values in json inside json; To formating your table you can use something like this:
from table t lateral view
explode( split(regexp_replace(get_json_object(ln, ''$.data.ad.s'), '\\[|\\]', ''), ',' ) ) tb1 as s
this code above will transform you "Array" in a column.
form more: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
I hope this help ...