Pig: loading a data file using an external schema file

后端 未结 2 1352
无人及你
无人及你 2021-01-18 19:54

I have a data file and a corresponding schema file stored in separate locations. I would like to load the data using the schema in the schema-file. I tried using

<         


        
相关标签:
2条回答
  • 2021-01-18 20:06

    It's possible to load data with schema file.

    When you store your data with the '-schema' flag, in the output path, there is .pig-schema file that hold json with the schema.

    You can use it when loading data

    B = LOAD '<>' USING PigStorage(',','-schema'); 
    

    You can see the schema by running

    describe A;
    

    Check this good post for more details.

    This feature is available beginning with Pig 0.10.

    0 讨论(0)
  • 2021-01-18 20:11

    The AS clause is for specifying the schema directly not the path to the schema file.

     A = LOAD '<file path>' USING PigStorage('\u0001') as 'type: long, id:chararray, nameformat:chararray';
    

    Alternatively, a file named .pig_schema containing the schema and located in your input directory could work as well. Never tried that though. It must be a JSON file with the following syntax:

    {"fields":[
            {"name":"type","type":55,"description":"Fu","schema":null},
            {"name":"id","type":15,"description":"Bar","schema":null},
            {"name":"nameFormat","type":55,"description":"Xu","schema":null},
        ] ,"version":0,"sortKeys":[],"sortKeyOrders":[]}
    

    This file is also generated if you specify the -schema option when storing with PigStorage.

    0 讨论(0)
提交回复
热议问题