I have two data.frames, (df1, df2) and I would like to replace the values in columns P1-P10 the letters with the values of df1$V2
but keeping the first two colu
Try some *pply
magic:
lookup<-tapply(df1$V2, df1$V1, unique) #Creates a lookup table
lookup.function<-function(x) as.numeric(lookup[as.character(x)]) #The function
df4<-data.frame(df2[,1:2], apply(df2[,3:12], 2,lookup.function )) #Builds the output
Update:
The *pply
family is much faster than merge
, at least an order of magnitude. Check this out
num<-1000
df1 = data.frame(V1=LETTERS, V2=rnorm(26))
df2<-data.frame(cbind(first=1:num,second=1:num, matrix(sample(LETTERS, num^2, replace=T), nrow=num, ncol=num)))
start<-Sys.time()
lookup<-tapply(df1$V2, df1$V1, unique)
lookup.function<-function(x) as.numeric(lookup[as.character(x)])
df4<-data.frame(cbind(df2[,1:2], data.frame(apply(df2[,3:(num+2)], 2, lookup.function ))))
(difftime(Sys.time(),start))
start<-Sys.time()
df4.merge <- "[<-"(df2, 3:num, value = df1[match(as.character(unlist(df2[3:num])), as.character(df1[[1]])), 2])
(difftime(Sys.time(),start))
sum(df4==df4.merge)==num^2
For 3000 columns and rows the *pply
combination needs 4.3s whereas merge
needs about 22s on my slow Intel. And it scales nicely. For 4000 columns and rows the respective times are 7.4 sec and 118 sec.