Hive : How to explode a JSON column embedded in a CSV file?

馋奶兔 提交于 2019-12-10 17:25:52

问题


From a CSV file (with a header and a pipe delimiter) I've got the two following contents which contain a JSON column (with a collection inside), like this:

First case (with a JSON collection with no name):

ProductId|IngestTime|ProductOrders
9180|20171025145034|[{"OrderId":"299","Location":"NY"},{"OrderId":"499","Location":"LA"}]
8251|20171026114034|[{"OrderId":"1799","Location":"London"}]

Second case (with a JSON collection named "Orders"):

ProductId|IngestTime|ProductOrders
9180|20171025145034|{"Orders":[{"OrderId":"299","Location":"NY"},{"OrderId":"499","Location":"LA"}]}
8251|20171026114034|{"Orders":[{"OrderId":"1799","Location":"London"}]}

Firstable, I create my "raw" table like this:

DROP TABLE IF EXISTS Product;
CREATE EXTERNAL TABLE Product (
  ProductId STRING,
  IngestTime STRING,
  ProductOrders STRING
)
COMMENT "Product raw table"
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\|'
STORED AS TEXTFILE
LOCATION
  '/data/product'
TBLPROPERTIES ("skip.header.line.count"="1");

When I query my table with:

SELECT * FROM Product

I've got the following answer:

First case (with a JSON collection with no name):

ProductId  IngestTime      ProductOrders
9180       20171025145034  [{"OrderId":"299","Location":"NY"},{"OrderId":"499","Location":"LA"}]
8251       20171026114034  [{"OrderId":"1799","Location":"London"}]

Second case (with a JSON collection named "Orders"):

ProductId  IngestTime      ProductOrders
9180       20171025145034  {"Orders":[{"OrderId":"299","Location":"NY"},{"OrderId":"499","Location":"LA"}]}
8251       20171026114034  {"Orders":[{"OrderId":"1799","Location":"London"}]}

Ok really nice, so far it works well !

But what I need now, is to create a SELECT query which returns:

ProductId  IngestTime      ProductOrderId ProductLocation
9180       20171025145034  299            NY
9180       20171025145034  499            LA
8251       20171026114034  1799           London

I really need a portable SQL query which works well for my two cases (with or without the tag "OrderId").

So far, I tried many combinations by using 'explode', 'get_json_object' and so on, but I still haven't found the right SQL query.

Thanks a lot for your help :-)


回答1:


you can try

CREATE EXTERNAL TABLE product(productid String,ingesttime String, productorders array<struct<orderid:String,location:string>> ) 

select productid,ingesttime, productorders.orderid[0] as orderid , productorders.location[0] as location from product



来源:https://stackoverflow.com/questions/46935265/hive-how-to-explode-a-json-column-embedded-in-a-csv-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!