R dplyr::mutate with ifelse conditioned on a global variable recycles result from first row

我的梦境 提交于 2020-02-25 06:10:20

问题


I am curious why an ifelse() statement within a call to dplyr::mutate() only seems to apply to the first row of my data frame. This returns a single value, which is recycled down the entire column. Since the expressions evaluated in either case of the ifelse() are only valid in the context of my data frame, I would expect the condition check and resulting expression evaluations to be performed on the columns as a whole, not just their first elements.

Here's an example: I have a variable defined outside the data frame called checkVar. Depending on the value of checkVar, I want to add differnt values to my data frame in a new column, z, that are computed as a function of existing columns.

If I do

checkVar <- 1
df <- data.frame( x=11:15, y=1:5 ) %>%
  dplyr::mutate( z=ifelse(checkVar == 1, x/y, x-y) )
df

it returns

   x y  z
1 11 1 11
2 12 2 11
3 13 3 11
4 14 4 11
5 15 5 11

Instead of z being the quotient of x and y for each row, all rows are populated with the quotient of x and y from the first row of the data frame.

However, if I specify rowwise(), I get the result I want:

df <- df %>%
  dplyr::rowwise() %>%
  dplyr::mutate( z=ifelse(checkVar == 1, x/y, x-y) ) %>%
  dplyr::ungroup()
df

returns

# A tibble: 5 x 3
      x     y         z
  <int> <int>     <dbl>
1    11     1 11.000000
2    12     2  6.000000
3    13     3  4.333333
4    14     4  3.500000
5    15     5  3.000000

Why do I have to explicitly specify rowwise() when x and y are only defined as columns of my data frame?


回答1:


This is not really related to dplyr::mutate but to how ifelse works, here is the docs ?ifelse:

ifelse returns a value with the same shape as test which is filled with elements selected from either yes or no depending on whether the element of test is TRUE or FALSE.

Usage

ifelse(test, yes, no)

And example:

ifelse(T, c(1,2,3), c(2,3,4))
# [1] 1

Your first case is vectorized, ifelse takes vectors x/y and x-y as yes and no parameters, since checkVar == 1 returns TRUE (scalar), ifelse returns (x/y)[1], i.e. the first element of vector x/y, which is 11 and get recycled to fill the new column z;

In your second case, mutate and ifelse is executed per row, so it's evaluated five times, and each time returns the value of x/y for that row.


If your condition is scalar, then you don't need vectorized ifelse, if/else is more suitable to use:

checkVar <- 1
mutate(df, z = if(checkVar == 1) x/y else x-y)

#   x y         z
#1 11 1 11.000000
#2 12 2  6.000000
#3 13 3  4.333333
#4 14 4  3.500000
#5 15 5  3.000000


来源:https://stackoverflow.com/questions/46614059/r-dplyrmutate-with-ifelse-conditioned-on-a-global-variable-recycles-result-fro

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!