I have a dataset that has Stock Codes with the range from 2-90214 (which has around 3000 unique values). Obviously, some values between 2 and 90214 are getting skipped. I want t
We can convert the numbers into factor and then transform it into numeric
as.numeric(factor(df$StockCode))
#[1] 1 3 2 1 3
If we need it starting from 100 we can add 99 in it
as.numeric(factor(df$StockCode)) + 99
Same numbers would get same factor level which upon converting into numeric would give same numeric value
We can use match
to get the index of the unique values, and then add 99
df1$Stock_Code <- match(df1$Stock_Code, unique(df1$Stock_Code)) + 99
df1$Stock_Code
[1] 100 101 102 100 101
Or another option is to convert to factor
and coerce to integer
with(df1, as.integer(factor(Stock_Code, levels = unique(Stock_Code)))+ 99)
#[1] 100 101 102 100 101
Using dplyr
library(dplyr)
dense_rank(df$Stock_Code) + 99