问题
From a CSV file (with a header and a pipe delimiter) I've got the two following contents which contain a JSON column (with a collection inside), like this:
First case (with a JSON collection with no name):
ProductId|IngestTime|ProductOrders
9180|20171025145034|[{"OrderId":"299","Location":"NY"},{"OrderId":"499","Location":"LA"}]
8251|20171026114034|[{"OrderId":"1799","Location":"London"}]
Second case (with a JSON collection named "Orders"):
ProductId|IngestTime|ProductOrders
9180|20171025145034|{"Orders":[{"OrderId":"299","Location":"NY"},{"OrderId":"499","Location":"LA"}]}
8251|20171026114034|{"Orders":[{"OrderId":"1799","Location":"London"}]}
Firstable, I create my "raw" table like this:
DROP TABLE IF EXISTS Product;
CREATE EXTERNAL TABLE Product (
ProductId STRING,
IngestTime STRING,
ProductOrders STRING
)
COMMENT "Product raw table"
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\|'
STORED AS TEXTFILE
LOCATION
'/data/product'
TBLPROPERTIES ("skip.header.line.count"="1");
When I query my table with:
SELECT * FROM Product
I've got the following answer:
First case (with a JSON collection with no name):
ProductId IngestTime ProductOrders
9180 20171025145034 [{"OrderId":"299","Location":"NY"},{"OrderId":"499","Location":"LA"}]
8251 20171026114034 [{"OrderId":"1799","Location":"London"}]
Second case (with a JSON collection named "Orders"):
ProductId IngestTime ProductOrders
9180 20171025145034 {"Orders":[{"OrderId":"299","Location":"NY"},{"OrderId":"499","Location":"LA"}]}
8251 20171026114034 {"Orders":[{"OrderId":"1799","Location":"London"}]}
Ok really nice, so far it works well !
But what I need now, is to create a SELECT query which returns:
ProductId IngestTime ProductOrderId ProductLocation
9180 20171025145034 299 NY
9180 20171025145034 499 LA
8251 20171026114034 1799 London
I really need a portable SQL query which works well for my two cases (with or without the tag "OrderId").
So far, I tried many combinations by using 'explode', 'get_json_object' and so on, but I still haven't found the right SQL query.
Thanks a lot for your help :-)
回答1:
you can try
CREATE EXTERNAL TABLE product(productid String,ingesttime String, productorders array<struct<orderid:String,location:string>> )
select productid,ingesttime, productorders.orderid[0] as orderid , productorders.location[0] as location from product
来源:https://stackoverflow.com/questions/46935265/hive-how-to-explode-a-json-column-embedded-in-a-csv-file