R - Recoding column with multiple text values associated with one code

自作多情 提交于 2019-12-22 14:05:28

问题


I'm trying to recode a column to determine the shift of an employee.

The data is messy and the word I am looking for must be extracted from the text. I've been trying various routes with if statements, stringr and dplyr packages, but can't figure out how to get them to work together.

I have this line of code, but str_match doesn't produce a true/false value.

Data$Shift <- if(str_match(Data$Unit, regex(first, ignore_case = TRUE))) {
    print("First Shift")
  } else {
    print("Lame")
  }

recode is working, but I have multiple values I need to recode and want to learn if there is a way to incorperate stringr into the recode function.

Data$Shift1 <- recode(Data$Unit, "1st" = "First Shift")

Currently, the text must be extracted from the column to contain 1st, First, or first for First Shift. My data looks like the Unit Column, and I want to Recode it into the Shift Column:

Unit                        Shift
Detention, Third Shift      Third Shift
D, 3rd Shift                Third Shift
1st                         First Shift
first shift                 First Shift
First Shift                 First Shift
1st shift                   First Shift
1st Shifft                  First Shift `

回答1:


I'd recommend just using grepl with case_when within dplyr.

library(dplyr)

Data %>% 
  mutate(Shift = case_when(grepl("first|1st", Unit, ignore.case = TRUE) ~ "First Shift",
                           grepl("third|3rd", Unit, ignore.case = TRUE) ~ "Third Shift",
                           TRUE                                         ~ "Neither"))
  • mutate creates our new column Shift

  • grepl returns a logical vector if it matches the pattern or not. In this case, the pattern I used was "first|1st". The | character means OR, so as is, that checks for either "first" OR "1st".

  • case_when works like multiple "if" statements, allowing us to keep our logic together (similar to SQL syntax). The final line of case_when is kind of our safety net here....if a value for Unit does not contain 1st or 3rd shift, it will return "Neither", and so we know to investigate further.

If you don't have a recent version of dplyr (>0.7.3), then case_when might not work for you. If so, we can replace case_when with a chain of nested ifelse.

Data %>% 
  mutate(Shift = ifelse(grepl("first|1st", Unit, ignore.case = TRUE),
                        "First Shift",
                        ifelse(grepl("third|3rd", Unit, ignore.case = TRUE),
                               "Third Shift",
                               "Neither")))

Not as pretty, but should be the same result since our patterns used in grepl are mutually exclusive.




回答2:


Keep it simple:

Data$shift[grepl("3rd", Data$shift)] <- "Third Shift"
Data$shift[grepl("1st", Data$shift)] <- "First Shift"

Etc.



来源:https://stackoverflow.com/questions/49632442/r-recoding-column-with-multiple-text-values-associated-with-one-code

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!