问题
I am curious why an ifelse()
statement within a call to dplyr::mutate()
only seems to apply to the first row of my data frame. This returns a single value, which is recycled down the entire column. Since the expressions evaluated in either case of the ifelse()
are only valid in the context of my data frame, I would expect the condition check and resulting expression evaluations to be performed on the columns as a whole, not just their first elements.
Here's an example: I have a variable defined outside the data frame called checkVar
. Depending on the value of checkVar
, I want to add differnt values to my data frame in a new column, z
, that are computed as a function of existing columns.
If I do
checkVar <- 1
df <- data.frame( x=11:15, y=1:5 ) %>%
dplyr::mutate( z=ifelse(checkVar == 1, x/y, x-y) )
df
it returns
x y z
1 11 1 11
2 12 2 11
3 13 3 11
4 14 4 11
5 15 5 11
Instead of z being the quotient of x and y for each row, all rows are populated with the quotient of x and y from the first row of the data frame.
However, if I specify rowwise()
, I get the result I want:
df <- df %>%
dplyr::rowwise() %>%
dplyr::mutate( z=ifelse(checkVar == 1, x/y, x-y) ) %>%
dplyr::ungroup()
df
returns
# A tibble: 5 x 3
x y z
<int> <int> <dbl>
1 11 1 11.000000
2 12 2 6.000000
3 13 3 4.333333
4 14 4 3.500000
5 15 5 3.000000
Why do I have to explicitly specify rowwise()
when x
and y
are only defined as columns of my data frame?
回答1:
This is not really related to dplyr::mutate
but to how ifelse
works, here is the docs ?ifelse:
ifelse returns a value with the same shape as test which is filled with elements selected from either yes or no depending on whether the element of test is TRUE or FALSE.
Usage
ifelse(test, yes, no)
And example:
ifelse(T, c(1,2,3), c(2,3,4))
# [1] 1
Your first case is vectorized, ifelse
takes vectors x/y
and x-y
as yes
and no
parameters, since checkVar == 1
returns TRUE (scalar), ifelse
returns (x/y)[1]
, i.e. the first element of vector x/y
, which is 11 and get recycled to fill the new column z
;
In your second case, mutate
and ifelse
is executed per row, so it's evaluated five times, and each time returns the value of x/y
for that row.
If your condition is scalar, then you don't need vectorized ifelse
, if/else
is more suitable to use:
checkVar <- 1
mutate(df, z = if(checkVar == 1) x/y else x-y)
# x y z
#1 11 1 11.000000
#2 12 2 6.000000
#3 13 3 4.333333
#4 14 4 3.500000
#5 15 5 3.000000
来源:https://stackoverflow.com/questions/46614059/r-dplyrmutate-with-ifelse-conditioned-on-a-global-variable-recycles-result-fro