问题
I have a dataset like the following.
dat1 <- read.table(header=TRUE, text="
ID Pa Gu Ta
8645 Rel345 Gel294 Tel452
6228 Rel345 Gel294 Tel467
5830 Rel345 Gel294 Tel467
1844 Rel345 Gel295 Tel467
4461 Rel345 Gel295 Tel467
2119 Rel345 Gel294 Tel452
1821 Rel345 Gel294 Tel467
6851 Rel345 Gel294 Tel467
4214 Rel345 Gel294 Tel452
2589 Rel346 Gel294 Tel467
2116 Rel347 Gel294 Tel452
8523 Rel348 Gel295 Tel468
2603 Rel348 Gel295 Tel468
2801 Rel348 Gel295 Tel452
1485 Rel348 Gel295 Tel468
2116 Rel348 Gel295 Tel452
8753 Rel348 Gel295 Tel452
4277 Rel348 Gel295 Tel468
7053 Rel348 Gel295 Tel468
3320 Rel348 Gel295 Tel452
7974 Rel348 Gel295 Tel468
")
dat1
ID Pa Gu Ta
1 8645 Rel_123 Gela_134 Tel_111
2 6228 Rel_123 Gela_134 Tel_112
3 5830 Rel_123 Gela_134 Tel_112
4 1844 Rel_123 Gela_135 Tel_112
5 4461 Rel_123 Gela_135 Tel_112
6 2119 Rel_123 Gela_134 Tel_111
7 1821 Rel_123 Gela_134 Tel_112
8 6851 Rel_123 Gela_134 Tel_112
9 4214 Rel_123 Gela_134 Tel_111
10 2589 Rel_124 Gela_134 Tel_112
11 2116 Rel_125 Gela_134 Tel_111
12 8523 Rel_126 Gela_135 Tel_113
13 2603 Rel_126 Gela_135 Tel_113
14 2801 Rel_126 Gela_135 Tel_111
15 1485 Rel_126 Gela_135 Tel_113
16 2116 Rel_126 Gela_135 Tel_111
17 8753 Rel_126 Gela_135 Tel_111
18 4277 Rel_126 Gela_135 Tel_113
19 7053 Rel_126 Gela_135 Tel_113
20 3320 Rel_126 Gela_135 Tel_111
21 7974 Rel_126 Gela_135 Tel_113
The attributes of right three columns are recoded like the folllowing:
dat2 <- read.table(header=TRUE, text="
Att New_Att
Rel345 Rel_123
Rel346 Rel_124
Rel347 Rel_125
Rel348 Rel_126
Gel294 Gela_134
Gel295 Gela_135
Tel452 Tel_111
Tel467 Tel_112
Tel468 Tel_113
")
dat2
Att New_Att
1 Rel345 Rel_123
2 Rel346 Rel_124
3 Rel347 Rel_125
4 Rel348 Rel_126
5 Gel294 Gela_134
6 Gel295 Gela_135
7 Tel452 Tel_111
8 Tel467 Tel_112
9 Tel468 Tel_113
Using plyr
package (by using revalue
function), I can make changes like the following.
library(plyr)
dat1$Pa<- revalue(dat1$Pa, c("Rel345"="Rel_123","Rel346"="Rel_124","Rel347"="Rel_125",
"Rel348"="Rel_126"))
dat1$Gu<- revalue(dat1$Gu, c("Gel294"="Gela_134","Gel295"="Gela_135"))
dat1$Ta<- revalue(dat1$Ta, c("Tel452"="Tel_111","Tel467"="Tel_112","Tel468"="Tel_113" ))
dat1
ID Pa Gu Ta
1 8645 Rel_123 Gela_134 Tel_111
2 6228 Rel_123 Gela_134 Tel_112
3 5830 Rel_123 Gela_134 Tel_112
4 1844 Rel_123 Gela_135 Tel_112
5 4461 Rel_123 Gela_135 Tel_112
6 2119 Rel_123 Gela_134 Tel_111
7 1821 Rel_123 Gela_134 Tel_112
8 6851 Rel_123 Gela_134 Tel_112
9 4214 Rel_123 Gela_134 Tel_111
10 2589 Rel_124 Gela_134 Tel_112
11 2116 Rel_125 Gela_134 Tel_111
12 8523 Rel_126 Gela_135 Tel_113
13 2603 Rel_126 Gela_135 Tel_113
14 2801 Rel_126 Gela_135 Tel_111
15 1485 Rel_126 Gela_135 Tel_113
16 2116 Rel_126 Gela_135 Tel_111
17 8753 Rel_126 Gela_135 Tel_111
18 4277 Rel_126 Gela_135 Tel_113
19 7053 Rel_126 Gela_135 Tel_113
20 3320 Rel_126 Gela_135 Tel_111
21 7974 Rel_126 Gela_135 Tel_113
I have a dataset with 1 million rows and some of the variables have more than 200 categories. So my above code is not convenient. I want to change the attribute name
by reading the recoding in dat2
.
回答1:
We loop through the columns of 'dat1' except the 'ID' column, match
the 'Att' from 'df2', use the numeric index to replace the column elements with the corresponding elements of 'New_Att'
dat1[-1] <- lapply(dat1[-1], function(x) dat2$New_Att[match(x, dat2$Att)])
Or we can convert the dataset to matrix and match
as before.
`dim<-`(dat2[,2][match(as.matrix(dat1[-1]), dat2[,1])], dim(dat1[-1]))
来源:https://stackoverflow.com/questions/33506724/revalue-attributes-from-multiple-columns