select count distinct using pig latin

前端 未结 3 1691
佛祖请我去吃肉
佛祖请我去吃肉 2021-02-13 01:53

I need help with this pig script. I am just getting a single record. I am selecting 2 columns and doing a count(distinct) on another while also using a where like clause to find

相关标签:
3条回答
  • 2021-02-13 02:30

    You could GROUP on each domain and then count the number of distinct elements in each group with a nested FOREACH syntax:

    D = group C by domain;
    E = foreach D { 
        unique_segments = DISTINCT C.segment;
        generate group, COUNT(unique_segments) as segment_cnt;
    };
    
    0 讨论(0)
  • 2021-02-13 02:33

    If you don't want to count on any group, you use this:

    G = FOREACH (GROUP A ALL){
    unique = DISTINCT A.field;
    GENERATE COUNT(unique) AS ct;
    };
    

    This will just give you a number.

    0 讨论(0)
  • 2021-02-13 02:37

    You can better define this as a macro:

    DEFINE DISTINCT_COUNT(A, c) RETURNS dist {
      temp = FOREACH $A GENERATE $c;                                                                                                                                                      
      dist = DISTINCT temp;                                                                                                                                                               
      groupAll = GROUP dist ALL;                                                                                                                                                          
      $dist = FOREACH groupAll GENERATE COUNT(dist);                                                                                                                                      
    }
    

    Usage:

    X = LOAD 'data' AS (x: int);

    Y = DISTINCT_COUNT(X, x);

    If you need to use it in a FOREACH instead then the easiest way is something like:

    ...GENERATE COUNT(Distinct(x))...

    Tested on Pig 12.

    0 讨论(0)
提交回复
热议问题