Load JSON array into Pig

前端 未结 1 1116
日久生厌
日久生厌 2021-01-28 08:02

I have a json file with the following format

[
  {
    \"id\": 2,
    \"createdBy\": 0,
    \"status\": 0,
    \"utcTime\": \"Oct 14, 2014 4:49:47 PM\",
    \"pl         


        
1条回答
  •  [愿得一人]
    2021-01-28 08:26

    I am also faced similar kind of problem sometime back, later i came to know that Pig JSON will not support multiline json format. It will always expect the json input must be in single line.

    Instead of native Jsonloader, i suggest you to use elephantbird json loader. It is pretty good for Jsons formats.

    You can download the jars from the below link

    http://www.java2s.com/Code/Jar/e/elephant.htm
    

    I changed your input format to single line and loaded through elephantbird as below

    input.json
    {"test":[{"id": 2,"createdBy": 0,"status": 0,"utcTime": "Oct 14, 2014 4:49:47 PM","placeName": "21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia","longitude": 77.5983817,"latitude": 12.9832418,"createdDate": "Sep 16, 2014 2:59:03 PM","accuracy": 5,"loginType": 1,"mobileNo": "0000005567"},{"id": 4,"createdBy": 0,"status": 0,"utcTime": "Oct 14, 2014 4:52:48 PM","placeName": "21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia","longitude": 77.5983817,"latitude": 12.9832418,"createdDate": "Oct 8, 2014 5:24:42 PM","accuracy": 5,"loginType": 1,"mobileNo": "0000005566"}]}
    
    PigScript:
    REGISTER '/tmp/elephant-bird-hadoop-compat-4.1.jar';
    REGISTER '/tmp/elephant-bird-pig-4.1.jar';
    
    A = LOAD 'input.json ' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad');
    B = FOREACH A GENERATE FLATTEN($0#'test');
    C = FOREACH B GENERATE FLATTEN($0) AS mymap;
    D = FOREACH C GENERATE mymap#'id',mymap#'placeName',mymap#'status';
    DUMP D;
    
    Output:
    (2,21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia,0)
    (4,21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia,0)
    

    0 讨论(0)
提交回复
热议问题