问题

I have like below mentioned dataframe:

Records:

ID        Remarks         Value
1         ABC             10
1         AAB             12
1         ZZX             15
2         XYZ             12
2         ABB             14

By utilizing the above mentioned dataframe, I want to add new column Status in the existing dataframe.

Where if the Remarks is ABC, AAB or ABB than status would be TRUE and for XYZ and ZZX it should be FALSE.

I am using below mentioned method for that but it didn't work.

Records$Status<-ifelse(Records$Remarks %in% ("ABC","AAB","ABB"),"TRUE",
                             ifelse(Records$Remarks %in% 
                      ("XYZ","ZZX"),"FALSE"))

And, bases on the Status i want to derive following output:

ID     TRUE    FALSE    Sum
1       2       1        37
2       1       1        26

回答1:

Records$Status<-ifelse(Records$Remarks %in% c("ABC","AAB","ABB"),TRUE,
                        ifelse(Records$Remarks %in% 
                                   c("XYZ","ZZX"),FALSE, NA))

You need to enclose your lists of strings with c(), and add an "else" condition for the second ifelse (but see Roman's answer below for a better way of doing this with case_when). (Also note that here I changed the "TRUE" and "FALSE" (as character class) into TRUE and FALSE (the logical class).

For the summary (using dplyr):

Records %>% group_by(ID) %>% 
dplyr::summarise(trues=sum(Status), falses=sum(!Status), sum=sum(Value))

# A tibble: 2 x 4
     ID trues falses   sum
  <int> <int>  <int> <int>
1     1     2      1    37
2     2     1      1    26

Of course, if you don't really need the intermediate Status column but just want the summary table, you can skip the first step altogether:

Records %>% group_by(ID) %>% 
dplyr::summarise(trues=sum(Remarks %in% c("ABC","AAB","ABB")), 
  falses=sum(Remarks %in% c("XYZ","ZZX")), 
  sum=sum(Value))

回答2:

Since it makes sense to use dplyr for your second question (see @iod's answer) it is also a good opportunity to use the package's very straightforward case_when() function for the first part.

Records %>% 
    mutate(Status = case_when(Remarks %in% c("ABC", "AAB", "ABB") ~ TRUE,
                              Remarks %in% c("XYZ", "ZZX") ~ FALSE,
                              TRUE ~ NA))

  ID Remarks Value Status
1  1     ABC    10   TRUE
2  1     AAB    12   TRUE
3  1     ZZX    15  FALSE
4  2     XYZ    12  FALSE
5  2     ABB    14   TRUE

回答3:

This approach will scale to a large number of remarks.

Load the data and prepare a matching data frame

The second data frame makes a matching between remarks and their TRUE or FALSE value.

library(readr)
library(dplyr)
library(tidyr)
dtf <- read_table("id        remarks         value
1         ABC             10
1         AAB             12
1         ZZX             15
2         XYZ             12
2         ABB             14")
truefalse <- data_frame(remarks = c("ABC", "AAB", "ABB", "ZZX", "XYZ"),
                        tf = c(TRUE, TRUE, TRUE, FALSE, FALSE))

Group by id and summarise

This is the format as asked in the question

dtf %>% 
    left_join(truefalse, by = "remarks") %>% 
    group_by(id) %>% 
    summarise(true = sum(tf),
              false = sum(!tf),
              value = sum(value)) 

# A tibble: 2 x 4
     id  true false value
  <int> <int> <int> <int>
1     1     2     1    37
2     2     1     1    26

Alternative proposal: group by id, tf and summarise

This option retains more details on the spread of value along the grouping variables id and tf.

    dtf %>% 
        left_join(truefalse, by = "remarks") %>% 
        group_by(id, tf) %>% 
        summarise(n = n(),
                  value = sum(value))
# A tibble: 4 x 4
# Groups:   id [?]
     id tf        n value
  <int> <lgl> <int> <int>
1     1 FALSE     1    15
2     1 TRUE      2    22
3     2 FALSE     1    12
4     2 TRUE      1    14

回答4:

In most cases, life is easier and lines are shorter without ifelse:

# short version
df$Status <- df$Remarks %in% c("ABC","AAB","ABB")

This version is OK for most purposes but it has shortcomings. Status will be FALSE if Remarks is NA or, say "garbage" but one might want it to be NA in these cases and FALSE only if Remarks %in% c("XYZ", "ZZX"). So one can add and multiply the conditions and finally convert it to logical:

df$Status <- as.logical(with(df,
                  Remarks %in% c("ABC","AAB","ABB")  +
                  ! Remarks %in% c("XYZ","ZZX") ))

And the summary table with base R:

aggregate(df[,-(1:2)], df["ID"], function(x) if(is.numeric(x)) sum(x) else table(x))

Umm... perhaps some formatting would be useful:

t1 <- aggregate(df[,-(1:2)], df["ID"], function(x) if(is.numeric(x)) sum(x) else table(x))
t1 <- t1[, c(1,3,2)]
colnames(t1) <- c("ID", "", "Sum")
t1
#   ID FALSE TRUE Sum
# 1  1     1    2  37
# 2  2     1    1  26

回答5:

This one returns correct result, only if there are two mentioned groups ("ABC", "AAB", "ABB" vs "XYZ","ZZX", ...). For me @iod's solution, is more R-like, but I've tried to avoid ifelse, and do it another way:

Code:

library(tidyverse)

dt %>%
  group_by(ID, Status = grepl("^A[AB][CB]$", Remarks)) %>%
  summarise(N = n(), Sum = sum(Value)) %>%
  spread(Status, N) %>%
  summarize_all(sum, na.rm = T) %>%                       # data still groupped by ID
  select("ID", "TRUE", "FALSE", "Sum")

# A tibble: 2 x 4
     ID `TRUE` `FALSE`   Sum
  <int>  <int>   <int> <int>
1     1      2       1    37
2     2      1       1    26

Data:

dt <- structure(
  list(ID = c(1L, 1L, 1L, 2L, 2L), 
       Remarks = c("ABC", "AAB", "ZZX", "XYZ", "ABB"),
       Value = c(10L, 12L, 15L, 12L, 14L)), 
  .Names = c("ID", "Remarks", "Value"), class = "data.frame", row.names = c(NA, -5L)
  )

来源：https://stackoverflow.com/questions/53064595/how-to-check-multiple-values-using-if-condition

标签

dplyr