How do you deal with empty or missing input files in Apache Pig?

前端 未结 2 1171
星月不相逢
星月不相逢 2021-01-12 14:40

Our workflow uses an AWS elastic map reduce cluster to run series of Pig jobs to manipulate a large amount of data into aggregated reports. Unfortunately, the input data is

2条回答
  •  北海茫月
    2021-01-12 15:25

    The approach I've been using is to run pig scripts from a shell. I have one job that gets data from six different input directories. So I've written a fragment for each input file.

    The shell checks for the existence of the input file and assembles a final pig script from the fragments.

    It then executes the final pig script. I know it's a bit of a Rube Goldberg approach, but so far so good. :-)

提交回复
热议问题