Since there is no else or default statements in pig split operation what would be the most elegant way to do the following? I\'m not a big fan of having to copy paste code.<
It is. Checking out the docs for SPLIT, you want to use OTHERWISE
. For example:
SPLIT data
INTO good_data IF (
(value > 0)),
good_data_big_values IF (
(value > 100)),
bad_data OTHERWISE;
So you almost got it. :)
NOTE: SPLIT
can put a single row into both good_data
and good_data_big_values
if, for example, value
was 150. I don't know if this is what you want, but you should be aware of it regardless. This also means that bad_data
will only contain rows where value
is 0 or less.
You could write an IsGood() UDF where all the conditions are checked. Then your pig is simply
SPLIT data
INTO good_data IF (IsGood(data))
good_data_big_values IF (IsGood(data) AND value > 100)),
bad_data IF (NOT IsGood(data))
;
Another option might be to use a macro