问题
I'm trying to recode a column to determine the shift of an employee.
The data is messy and the word I am looking for must be extracted from the text. I've been trying various routes with if
statements, stringr
and dplyr
packages, but can't figure out how to get them to work together.
I have this line of code, but str_match doesn't produce a true/false value.
Data$Shift <- if(str_match(Data$Unit, regex(first, ignore_case = TRUE))) {
print("First Shift")
} else {
print("Lame")
}
recode is working, but I have multiple values I need to recode and want to learn if there is a way to incorperate stringr into the recode function.
Data$Shift1 <- recode(Data$Unit, "1st" = "First Shift")
Currently, the text must be extracted from the column to contain 1st, First, or first for First Shift. My data looks like the Unit Column, and I want to Recode it into the Shift Column:
Unit Shift
Detention, Third Shift Third Shift
D, 3rd Shift Third Shift
1st First Shift
first shift First Shift
First Shift First Shift
1st shift First Shift
1st Shifft First Shift `
回答1:
I'd recommend just using grepl
with case_when
within dplyr
.
library(dplyr)
Data %>%
mutate(Shift = case_when(grepl("first|1st", Unit, ignore.case = TRUE) ~ "First Shift",
grepl("third|3rd", Unit, ignore.case = TRUE) ~ "Third Shift",
TRUE ~ "Neither"))
mutate
creates our new columnShift
grepl
returns a logical vector if it matches the pattern or not. In this case, the pattern I used was"first|1st"
. The|
character means OR, so as is, that checks for either "first" OR "1st".case_when
works like multiple "if" statements, allowing us to keep our logic together (similar to SQL syntax). The final line ofcase_when
is kind of our safety net here....if a value forUnit
does not contain 1st or 3rd shift, it will return "Neither", and so we know to investigate further.
If you don't have a recent version of dplyr
(>0.7.3
), then case_when
might not work for you. If so, we can replace case_when
with a chain of nested ifelse
.
Data %>%
mutate(Shift = ifelse(grepl("first|1st", Unit, ignore.case = TRUE),
"First Shift",
ifelse(grepl("third|3rd", Unit, ignore.case = TRUE),
"Third Shift",
"Neither")))
Not as pretty, but should be the same result since our patterns used in grepl
are mutually exclusive.
回答2:
Keep it simple:
Data$shift[grepl("3rd", Data$shift)] <- "Third Shift"
Data$shift[grepl("1st", Data$shift)] <- "First Shift"
Etc.
来源:https://stackoverflow.com/questions/49632442/r-recoding-column-with-multiple-text-values-associated-with-one-code