I have a problem when adding row numbers using Apache Pig. The problem is that I have a STR_ID column and I want to add a ROW_NUM column for the data in STR_ID, which is the
In Hive:
Query
select str_id,row_number() over() from tabledata;
Output
3D64B18BC842 1
BAECEFA8EFB6 2
346B13E4E240 3
6D8A9D0249B4 4
9FD024AA52BA 5
Facebook posted a number of hive UDFs including NumberRows. Depending on your hive version (I believe 0.8) you may need to add an attribute to the class (stateful=true).
This is good answer for you on my example
Step 1. Define row_sequence() function to process for auto increase ID
add jar /Users/trongtran/research/hadoop/dev/hive-0.9.0-bin/lib/hive-contrib-0.9.0.jar;
drop temporary function row_sequence;
create temporary function row_sequence as 'org.apache.hadoop.hive.contrib.udf.UDFRowSequence';
Step 2. Insert unique id & STR
INSERT OVERWRITE TABLE new_table
SELECT
row_sequence(),
STR_ID
FROM old_table;
In Hive:
select
str_id, ROW_NUMBER() OVER() as row_num
from myTable;
Pig 0.11 introduced a RANK operator that can be used for this purpose.
From version 0.11, hive supports analytic functions like lead,lag and also row number
https://issues.apache.org/jira/browse/HIVE-896