问题

I have a data.frame df where the column x is populated with integers (1-9). I would like to update columns y and z based on the value of x as follows:

if x is 1,2, or 3 | y = 1 ## if x is 1,4, or 7 | z = 1 
if x is 4,5, or 6 | y = 2 ## if x is 2,5, or 8 | z = 2 
if x is 7,8, or 9 | y = 3 ## if x is 3,6, or 9 | z = 3

Below is a data.frame with the desired output for y and z

df <- structure(list(x = c(1L, 2L, 3L, 3L, 4L, 2L, 1L, 2L, 5L, 2L, 
1L, 6L, 3L, 7L, 3L, 2L, 1L, 4L, 3L, 2L), y = c(1L, 1L, 1L, 1L, 
2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 3L, 1L, 1L, 1L, 2L, 1L, 1L
), z = c(1L, 2L, 3L, 3L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 3L, 3L, 
1L, 3L, 2L, 1L, 1L, 3L, 2L)), .Names = c("x", "y", "z"), class = "data.frame", row.names = c(NA, 
-20L))

I can write a for-loop with multiple if statements to fill y and z row by row. This doesn't seem very r: it is not vectorized. Is there a method to specify what numeric values will correspond to new numeric values? Like a map or key to indicate which values will become based on the previous values.

回答1:

Solution #1: Lookup Vector

Assuming the mismatches I pointed out in my comment are mistakes in the data, and not in the rules, then you can accomplish this as follows:

x2y <- rep(1:3,each=3);
x2z <- rep(1:3,3);
df$y <- x2y[df$x];
df$z <- x2z[df$x];
df1 <- df; ## for identical() calls later
df;
##    x y z
## 1  1 1 1
## 2  2 1 2
## 3  3 1 3
## 4  3 1 3
## 5  4 2 1
## 6  2 1 2
## 7  1 1 1
## 8  2 1 2
## 9  5 2 2
## 10 2 1 2
## 11 1 1 1
## 12 6 2 3
## 13 3 1 3
## 14 7 3 1
## 15 3 1 3
## 16 2 1 2
## 17 1 1 1
## 18 4 2 1
## 19 3 1 3
## 20 2 1 2

The above solution is dependent on the fact that the domain of x consists of contiguous integer values beginning from 1, so a direct index into a "lookup vector" suffices. If x began at a very high number but was still contiguous you could make this solution work by subtracting one less than the minimum of x before indexing.

Solution #2: Lookup Table

If you don't like this assumption, then you can accomplish the task with a lookup table:

library('data.table');
lookup <- data.table(x=1:9,y=x2y,z=x2z,key='x');
lookup;
##    x y z
## 1: 1 1 1
## 2: 2 1 2
## 3: 3 1 3
## 4: 4 2 1
## 5: 5 2 2
## 6: 6 2 3
## 7: 7 3 1
## 8: 8 3 2
## 9: 9 3 3
df[c('y','z')] <- lookup[df['x'],.(y,z)];
identical(df,df1);
## [1] TRUE

Or base R approach:

lookup <- data.frame(x=1:9,y=x2y,z=x2z);
lookup;
##   x y z
## 1 1 1 1
## 2 2 1 2
## 3 3 1 3
## 4 4 2 1
## 5 5 2 2
## 6 6 2 3
## 7 7 3 1
## 8 8 3 2
## 9 9 3 3
df[c('y','z')] <- lookup[match(df$x,lookup$x),c('y','z')];
identical(df,df1);
## [1] TRUE

Solution #3: Arithmetic Expression

Yet another alternative is to devise arithmetic expressions equivalent to the mapping:

df$y <- (df$x-1L)%/%3L+1L;
df$z <- 3L--df$x%%3L;
identical(df,df1);
## [1] TRUE

This particular solution is dependent on the fact that your mapping happens to possess a regularity that lends itself to arithmetic description.

With regard to implementation, it also takes advantage of a bit of a non-obvious property of R precedence rules (actually this is true of other languages as well, such as C/C++ and Java), namely that unary negative is higher than modulus which is higher than binary subtraction, thus the calculation for df$z is equivalent to 3L-((-df$x)%%3L).

To go into more detail regarding the z calculation: It is not possible to describe the mapping with a straight modulus of df$x%%3, because the 3, 6, and 9 inputs would mod to zero. That could be solved with a simple index-assign operation, but I wanted to achieve a simpler and purely arithmetic solution. To get from zero to 3 we can subtract df$x%%3 from 3, but that would mess up (invert) the remaining values. I realized that by taking the mod of the negative of the input values, we would "pre-invert" them, and then subtracting all of them from 3 would "right" them and would also convert the zeroes into 3, as desired.

来源：https://stackoverflow.com/questions/31666823/condtionally-create-new-columns-based-on-specific-numeric-values-keys-from-exi

标签

dictionary

dataframe

Condtionally create new columns based on specific numeric values (keys) from existing column

问题

回答1:

Solution #1: Lookup Vector

Solution #2: Lookup Table

Solution #3: Arithmetic Expression