How do you deal with empty or missing input files in Apache Pig?

前端未结

关注

 2  1171

星月不相逢 2021-01-12 14:40

Our workflow uses an AWS elastic map reduce cluster to run series of Pig jobs to manipulate a large amount of data into aggregated reports. Unfortunately, the input data is

2条回答

北海茫月 (楼主)

2021-01-12 15:25

The approach I've been using is to run pig scripts from a shell. I have one job that gets data from six different input directories. So I've written a fragment for each input file.

The shell checks for the existence of the input file and assembles a final pig script from the fragments.

It then executes the final pig script. I know it's a bit of a Rube Goldberg approach, but so far so good. :-)

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...