问题
I have a data.frame df
where the column x is populated with integers (1-9). I would like to update columns y and z based on the value of x as follows:
if x is 1,2, or 3 | y = 1 ## if x is 1,4, or 7 | z = 1
if x is 4,5, or 6 | y = 2 ## if x is 2,5, or 8 | z = 2
if x is 7,8, or 9 | y = 3 ## if x is 3,6, or 9 | z = 3
Below is a data.frame with the desired output for y
and z
df <- structure(list(x = c(1L, 2L, 3L, 3L, 4L, 2L, 1L, 2L, 5L, 2L,
1L, 6L, 3L, 7L, 3L, 2L, 1L, 4L, 3L, 2L), y = c(1L, 1L, 1L, 1L,
2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 3L, 1L, 1L, 1L, 2L, 1L, 1L
), z = c(1L, 2L, 3L, 3L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 3L, 3L,
1L, 3L, 2L, 1L, 1L, 3L, 2L)), .Names = c("x", "y", "z"), class = "data.frame", row.names = c(NA,
-20L))
I can write a for-loop with multiple if statements to fill y
and z
row by row. This doesn't seem very r: it is not vectorized. Is there a method to specify what numeric values will correspond to new numeric values? Like a map or key to indicate which values will become based on the previous values.
回答1:
Solution #1: Lookup Vector
Assuming the mismatches I pointed out in my comment are mistakes in the data, and not in the rules, then you can accomplish this as follows:
x2y <- rep(1:3,each=3);
x2z <- rep(1:3,3);
df$y <- x2y[df$x];
df$z <- x2z[df$x];
df1 <- df; ## for identical() calls later
df;
## x y z
## 1 1 1 1
## 2 2 1 2
## 3 3 1 3
## 4 3 1 3
## 5 4 2 1
## 6 2 1 2
## 7 1 1 1
## 8 2 1 2
## 9 5 2 2
## 10 2 1 2
## 11 1 1 1
## 12 6 2 3
## 13 3 1 3
## 14 7 3 1
## 15 3 1 3
## 16 2 1 2
## 17 1 1 1
## 18 4 2 1
## 19 3 1 3
## 20 2 1 2
The above solution is dependent on the fact that the domain of x
consists of contiguous integer values beginning from 1, so a direct index into a "lookup vector" suffices. If x
began at a very high number but was still contiguous you could make this solution work by subtracting one less than the minimum of x
before indexing.
Solution #2: Lookup Table
If you don't like this assumption, then you can accomplish the task with a lookup table:
library('data.table');
lookup <- data.table(x=1:9,y=x2y,z=x2z,key='x');
lookup;
## x y z
## 1: 1 1 1
## 2: 2 1 2
## 3: 3 1 3
## 4: 4 2 1
## 5: 5 2 2
## 6: 6 2 3
## 7: 7 3 1
## 8: 8 3 2
## 9: 9 3 3
df[c('y','z')] <- lookup[df['x'],.(y,z)];
identical(df,df1);
## [1] TRUE
Or base R approach:
lookup <- data.frame(x=1:9,y=x2y,z=x2z);
lookup;
## x y z
## 1 1 1 1
## 2 2 1 2
## 3 3 1 3
## 4 4 2 1
## 5 5 2 2
## 6 6 2 3
## 7 7 3 1
## 8 8 3 2
## 9 9 3 3
df[c('y','z')] <- lookup[match(df$x,lookup$x),c('y','z')];
identical(df,df1);
## [1] TRUE
Solution #3: Arithmetic Expression
Yet another alternative is to devise arithmetic expressions equivalent to the mapping:
df$y <- (df$x-1L)%/%3L+1L;
df$z <- 3L--df$x%%3L;
identical(df,df1);
## [1] TRUE
This particular solution is dependent on the fact that your mapping happens to possess a regularity that lends itself to arithmetic description.
With regard to implementation, it also takes advantage of a bit of a non-obvious property of R precedence rules (actually this is true of other languages as well, such as C/C++ and Java), namely that unary negative is higher than modulus which is higher than binary subtraction, thus the calculation for df$z
is equivalent to 3L-((-df$x)%%3L)
.
To go into more detail regarding the z
calculation: It is not possible to describe the mapping with a straight modulus of df$x%%3
, because the 3, 6, and 9 inputs would mod to zero. That could be solved with a simple index-assign operation, but I wanted to achieve a simpler and purely arithmetic solution. To get from zero to 3 we can subtract df$x%%3
from 3, but that would mess up (invert) the remaining values. I realized that by taking the mod of the negative of the input values, we would "pre-invert" them, and then subtracting all of them from 3 would "right" them and would also convert the zeroes into 3, as desired.
来源:https://stackoverflow.com/questions/31666823/condtionally-create-new-columns-based-on-specific-numeric-values-keys-from-exi