问题
I am currently trying to apply the summarise function in order to isolate the relevant observations from a large data set. A simple reproducible example is given here:
df <- data.frame(c(1,1,1,2,2,2,3,3,3), as.logical(c(TRUE,FALSE,TRUE,TRUE,TRUE,TRUE,FALSE,TRUE,FALSE)),
as.numeric(c(0,5,0,0,0,0,7,0,7)))
colnames(df) <- c("ID", "Status", "Price")
ID Status Price
1 1 TRUE 0
2 1 FALSE 5
3 1 TRUE 0
4 2 TRUE 0
5 2 TRUE 0
6 2 TRUE 0
7 3 FALSE 7
8 3 TRUE 0
9 3 FALSE 7
I would like to sort the table by observation and get the status TRUE only if all three observations are TRUE (figured out) and then want to get the price corresponding to the status (i.e. 5 for observation 1 as FALSE, 0 for observation 2 as TRUE and 7 for observation 3 as FALSE).
From Summarize with conditions in dplyr I have figured out that I can - just as usually - specify the conditions in square brackets. My code so far thus looks like this:
library(dplyr)
result <- df %>%
group_by(ID) %>%
summarize(Status = all(Status), Test = ifelse(all(Status) == TRUE,
first(Price[Status == TRUE]), first(Price[Status == FALSE])))
# This is what I get:
# A tibble: 3 x 3
ID Status Test
<dbl> <lgl> <dbl>
1 1. FALSE 0.
2 2. TRUE 0.
3 3. FALSE 7.
But as you can see, for ID = 1 it gives an incorrect price. I have been trying this forever, so I would appreciate any hint as to where I have been going wrong.
回答1:
We could keep the all(Status)
as second argument in summarise
(or change the column name) and also, it can be done with if/else
as the logic seems to return a single TRUE/FALSE based on whether all
of the 'Status' is TRUE or not
df %>%
group_by(ID) %>%
summarise( Test = if(all(Status)) first(Price[Status]) else
first(Price[!Status]), Status = all(Status))
# A tibble: 3 x 3
# ID Test Status
# <dbl> <dbl> <lgl>
#1 1 5 FALSE
#2 2 0 TRUE
#3 3 7 FALSE
NOTE: It is better not to use ifelse
with unequal lengths for its arguments
回答2:
Could do:
df %>%
group_by(ID) %>%
mutate(status = Status) %>%
summarise(
Status = all(Status),
Test = ifelse(Status == TRUE,
first(Price),
first(Price[status == FALSE]))
)
Output:
# A tibble: 3 x 3
ID Status Test
<dbl> <lgl> <dbl>
1 1 FALSE 5
2 2 TRUE 0
3 3 FALSE 7
The issue is that you want to use Status
for Test
column while you've already modified it so that it doesn't contain original values anymore.
Make a copy before (I've saved it in status
), execute ifelse
on it and it'll run fine.
来源:https://stackoverflow.com/questions/54824804/using-dplyr-summarise-with-conditions