Change value of all strings in column based on condition

白昼怎懂夜的黑 提交于 2019-12-24 13:39:31

问题


New-ish to R, I have a question about data cleaning.

I have a column that contains what type of drive a car is - four wheel, all wheel, 2 wheel etc

The problem is there is no standardization, so some rows have 4 WHEEL drive, 4wd, 4WD, Four - Wheel - Drive, etc

The first step is easy, which is to uppercase everything but the step I'm having trouble with is changing each value to a standard, like 4WD, without having to recode each unique drive.

Something like For Each value in column, if value LIKE/CONTAINS "FOUR" change to "4WD".

I've researched recode and stringdist and mutate but I can't find a fit. When I typed it out it sounds like I need a loop but not sure the exact syntax.

If the solution could work with the tidyverse that would be great!


回答1:


Welcome to StackOverflow! I've answered your question, but in the future, please include a small sample of your data so it's easier for us to solve your problem. Food for thought: How to make a reproducible example

require(plyr)
require(dplyr)


# Since you haven't provided a data sample, I'm going to assume your dataframe is named "DF" and your column's name is "Drive"

# Set everything to lowercase to pare down uniqueness
DF <- mutate(DF, Drive = replace(Drive, Drive, tolower(Drive)))


# You'll need one line like this for each replacement.  Of the following form:
#     <column_name> = replace(<column_name>, <condition>, <new value>)
DF <- mutate(DF, Drive = replace(Drive, Drive == "4 wheel drive", "4WD"))



回答2:


You can use ifelse and grepl. Change the first argument of grepl to something that will match all your desired cases. Below searches for strings containing "4" or "FOUR"

df$cleaned_col <- ifelse(grepl('4|four', df$colname_here, ignore.case = T), '4WD', df$colname_here)

If you want to do multiple comparisons you may want to use dplyr::case_when with %like% from data.table

require(dplyr);require(data.table)
df %>% mutate(cleaned = case_when(colname %like% 'a|b' ~ "there's an a or b in there"
                                  , colname %like% 'c' ~ "has a c in it"
                                  , T ~ "no a or b or c"))


来源:https://stackoverflow.com/questions/48629202/change-value-of-all-strings-in-column-based-on-condition

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!