I have data that comes to me with many similar variables, with an additional variable which indicates which one of those similar variables I really want. Using a loop I
I noticed this answer from @josliber see (https://stackoverflow.com/a/30279903/4606130) when trying to work on a data.table
solution and it seems fast:
df[cbind(seq(df$var), df$var)]
[1] "3050" "2062" "1036" "4001" "3075" "4083" "1085" "3061"
One more vectorized option is to use a nested ifelse()
. It has the benefit of being, at least in my opinion, relatively readable compared to other solutions. But the obvious downside of not scaling when the number of variables grows.
ifelse(df$var == "yr1", df$yr1,
ifelse(df$var == "yr2", df$yr2,
ifelse(df$var == "yr3", df$yr3,
ifelse(df$var == "yr4", df$yr4, NA))))
[1] 3050 2062 1036 4001 3075 4083 1085 3061
I like the syntax of dplyr
and tidyr
:
df$ID = 1:nrow(df)
library(dplyr)
library(tidyr)
df %>%
gather(year, value, yr1:yr4) %>%
filter(var == year) %>%
select(-year) %>%
spread(year, value) %>%
arrange(ID)
We can use the row/column indexing. It should be fast compared to the loop.
df[-ncol(df)][cbind(1:nrow(df),match(df$var,head(names(df),-1)))]
#[1] 3050 2062 1036 4001 3075 4083 1085 3061
Just for some diversity, a data.table
solution would be (should be slow compared to the indexing above). Convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by the sequence of rows, we get
the value of 'var' after converting to character
class.
library(data.table)
setDT(df)[, ycode := get(as.character(var)) , 1:nrow(df)]
df
# yr1 yr2 yr3 yr4 var ycode
#1: 1090 2066 3050 4012 yr3 3050
#2: 1026 2062 3071 4026 yr2 2062
#3: 1036 2006 3098 4038 yr1 1036
#4: 1056 2020 3037 4001 yr4 4001
#5: 1088 2017 3075 4037 yr3 3075
#6: 1019 2065 3089 4083 yr4 4083
#7: 1085 2036 3020 4032 yr1 1085
#8: 1096 2072 3061 4045 yr3 3061