R data.table multi column recode/sub-assign [duplicate]

社会主义新天地 提交于 2019-12-30 22:53:35

问题


Let DT be a data.table:

DT<-data.table(V1=sample(10),
               V2=sample(10),
               ...
               V9=sample(10),)

Is there a better/simpler method to do multicolumn recode/sub-assign like this:

DT[V1==1 | V1==7,V1:=NA]
DT[V2==1 | V2==7,V2:=NA]
DT[V3==1 | V3==7,V3:=NA]
DT[V4==1 | V4==7,V4:=NA]
DT[V5==1 | V5==7,V5:=NA]
DT[V6==1 | V6==7,V6:=NA]
DT[V7==1 | V7==7,V7:=NA]
DT[V8==1 | V8==7,V8:=NA]
DT[V9==1 | V9==7,V9:=NA]

Variable names are completely arbitrary and do not necessarily have numbers. Many columns (Vx:Vx) and one recode pattern for all (NAME==1 | NAME==7, NAME:=something).

And further, how to multicolumn subassign NA's to something else. E.g in data.frame style:

data[,columns][is.na(data[,columns])] <- a_value

回答1:


You could use set for replacing values in multiple columns. Based on the ?set, it is fast as the overhead of [.data.table is avoided. We use a for loop to loop over the columns and replace the values that were indexed by the 'i' and 'j' with 'NA'

 for(j in seq_along(DT)) {
      set(DT, i=which(DT[[j]] %in% c(1,7)), j=j, value=NA)
  }

EDIT: Included @David Arenburg's comments.

data

set.seed(24)
DT<-data.table(V1=sample(10), V2= sample(10), V3= sample(10))


来源:https://stackoverflow.com/questions/31720734/r-data-table-multi-column-recode-sub-assign

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!