问题
New-ish to R, I have a question about data cleaning.
I have a column that contains what type of drive a car is - four wheel, all wheel, 2 wheel etc
The problem is there is no standardization, so some rows have 4 WHEEL drive, 4wd, 4WD, Four - Wheel - Drive, etc
The first step is easy, which is to uppercase everything but the step I'm having trouble with is changing each value to a standard, like 4WD, without having to recode each unique drive.
Something like For Each value in column, if value LIKE/CONTAINS "FOUR" change to "4WD".
I've researched recode and stringdist and mutate but I can't find a fit. When I typed it out it sounds like I need a loop but not sure the exact syntax.
If the solution could work with the tidyverse that would be great!
回答1:
Welcome to StackOverflow! I've answered your question, but in the future, please include a small sample of your data so it's easier for us to solve your problem. Food for thought: How to make a reproducible example
require(plyr)
require(dplyr)
# Since you haven't provided a data sample, I'm going to assume your dataframe is named "DF" and your column's name is "Drive"
# Set everything to lowercase to pare down uniqueness
DF <- mutate(DF, Drive = replace(Drive, Drive, tolower(Drive)))
# You'll need one line like this for each replacement. Of the following form:
# <column_name> = replace(<column_name>, <condition>, <new value>)
DF <- mutate(DF, Drive = replace(Drive, Drive == "4 wheel drive", "4WD"))
回答2:
You can use ifelse
and grepl
. Change the first argument of grepl
to something that will match all your desired cases. Below searches for strings containing "4" or "FOUR"
df$cleaned_col <- ifelse(grepl('4|four', df$colname_here, ignore.case = T), '4WD', df$colname_here)
If you want to do multiple comparisons you may want to use dplyr::case_when
with %like%
from data.table
require(dplyr);require(data.table)
df %>% mutate(cleaned = case_when(colname %like% 'a|b' ~ "there's an a or b in there"
, colname %like% 'c' ~ "has a c in it"
, T ~ "no a or b or c"))
来源:https://stackoverflow.com/questions/48629202/change-value-of-all-strings-in-column-based-on-condition