case_when with partial string match and contains()

自古美人都是妖i 提交于 2021-01-27 18:09:01

问题


I'm working with a dataset that has many columns called status1, status2, etc. Within those columns, it says if someone is exempt, complete, registered, etc.

Unfortunately, the exempt inputs are not consistent; here's a sample:

library(dplyr)

problem <- tibble(person = c("Corey", "Sibley", "Justin", "Ruth"),
                  status1 = c("7EXEMPT", "Completed", "Completed", "Pending"),
                  status2 = c("exempt", "Completed", "Completed", "Pending"),
                  status3 = c("EXEMPTED", "Completed", "Completed", "ExempT - 14"))

I'm trying to use case_when() to make a new column that has their final status. If it ever says completed, then they are completed. If it ever says exempt without saying complete, then they are exempt.

The important part is that I want my code to use contains("status"), or some equivalent that only targets the status columns and doesn't require typing them all, and I want it to only require a partial string match for exempt.

As for using contains with case_when, I saw this example, but I wasn't able to apply it to my case: mutate with case_when and contains

This is what I've tried to use so far, but as you can guess, it has not worked:

library(purrr)
library(dplyr)
library(stringr)
solution <- problem %>%
  mutate(final= case_when(pmap_chr(select(., contains("status")), ~
    any(c(...) == str_detect(., "Exempt") ~ "Exclude",
               TRUE ~ "Complete"
  ))))

Here's what I want the final product to look like:

solution <- tibble(person = c("Corey", "Sibley", "Justin", "Ruth"),
                   status1 = c("7EXEMPT", "Completed", "Completed", "Pending"),
                   status2 = c("exempt", "Completed", "Completed", "Pending"),
                   status3 = c("EXEMPTED", "Completed", "Completed", "ExempT - 14"),
                   final = c("Exclude", "Completed", "Completed", "Exclude")) 

Thank you!


回答1:


I think you are doing it backwards. Put case_when inside pmap_chr instead of the other way around:

library(dplyr)
library(purrr)
library(stringr)

problem %>%
  mutate(final = pmap_chr(select(., contains("status")), 
                          ~ case_when(any(str_detect(c(...), "(?i)Exempt")) ~ "Exclude",
                                      TRUE ~ "Completed")))

For each pmap iteration (each row of problem dataset), we want to use case_when to check if there exists the string Exempt. (?i) in str_detect makes it case insensitive. This is the same as writing str_detect(c(...), regex("Exempt", ignore_case = TRUE))

Output:

# A tibble: 4 x 5
  person status1   status2   status3     final    
  <chr>  <chr>     <chr>     <chr>       <chr>    
1 Corey  7EXEMPT   exempt    EXEMPTED    Exclude  
2 Sibley Completed Completed Completed   Completed
3 Justin Completed Completed Completed   Completed
4 Ruth   Pending   Pending   ExempT - 14 Exclude


来源:https://stackoverflow.com/questions/56588108/case-when-with-partial-string-match-and-contains

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!