pig - split, lack of default or if/else

后端 未结 2 1520
谎友^
谎友^ 2021-01-05 18:01

Since there is no else or default statements in pig split operation what would be the most elegant way to do the following? I\'m not a big fan of having to copy paste code.<

相关标签:
2条回答
  • 2021-01-05 18:35

    It is. Checking out the docs for SPLIT, you want to use OTHERWISE. For example:

    SPLIT data
        INTO good_data IF (
        (value > 0)),
        good_data_big_values IF (
        (value > 100)),
        bad_data OTHERWISE;
    

    So you almost got it. :)

    NOTE: SPLIT can put a single row into both good_data and good_data_big_values if, for example, value was 150. I don't know if this is what you want, but you should be aware of it regardless. This also means that bad_data will only contain rows where value is 0 or less.

    0 讨论(0)
  • 2021-01-05 18:49

    You could write an IsGood() UDF where all the conditions are checked. Then your pig is simply

    SPLIT data
        INTO good_data IF (IsGood(data))
             good_data_big_values IF (IsGood(data) AND value > 100)),
             bad_data IF (NOT IsGood(data))
    ;
    

    Another option might be to use a macro

    0 讨论(0)
提交回复
热议问题