How can I add row numbers for rows in PIG or HIVE?

前端 未结 8 1849
半阙折子戏
半阙折子戏 2020-12-17 02:18

I have a problem when adding row numbers using Apache Pig. The problem is that I have a STR_ID column and I want to add a ROW_NUM column for the data in STR_ID, which is the

相关标签:
8条回答
  • 2020-12-17 03:00

    In Hive:

    Query

    select str_id,row_number() over() from tabledata;
    

    Output

    3D64B18BC842      1
    BAECEFA8EFB6      2
    346B13E4E240      3
    6D8A9D0249B4      4
    9FD024AA52BA      5
    
    0 讨论(0)
  • 2020-12-17 03:05

    Facebook posted a number of hive UDFs including NumberRows. Depending on your hive version (I believe 0.8) you may need to add an attribute to the class (stateful=true).

    0 讨论(0)
  • 2020-12-17 03:07

    This is good answer for you on my example

    Step 1. Define row_sequence() function to process for auto increase ID

    add jar /Users/trongtran/research/hadoop/dev/hive-0.9.0-bin/lib/hive-contrib-0.9.0.jar;
    drop temporary function row_sequence;
    create temporary function row_sequence as 'org.apache.hadoop.hive.contrib.udf.UDFRowSequence';
    

    Step 2. Insert unique id & STR

    INSERT OVERWRITE TABLE new_table
    SELECT 
        row_sequence(),
        STR_ID
    FROM old_table;
    
    0 讨论(0)
  • 2020-12-17 03:09

    In Hive:

    select
    str_id, ROW_NUMBER() OVER() as row_num 
    from myTable;
    
    0 讨论(0)
  • 2020-12-17 03:13

    Pig 0.11 introduced a RANK operator that can be used for this purpose.

    0 讨论(0)
  • 2020-12-17 03:17

    From version 0.11, hive supports analytic functions like lead,lag and also row number

    https://issues.apache.org/jira/browse/HIVE-896

    0 讨论(0)
提交回复
热议问题