问题
I'm trying to use str_replace_all to replace many different values (i.e. "Mod", "M2", "M3", "Interviewer") with one the consistent string (i.e. "Moderator:"). I'm doing this with multiple different categories, and I want avoid having to write each unique value out as there are a lot.
So I made a tibble consisting of all the unique values that I want to make standardized and read it in and then pulled out each column (there are 5 but only 2 shown for simplicity) to make them into vectors:
speak_names <- read_csv("speak_names.csv")
speak_namesMisc <- dplyr::pull(speak_names, Misc)
speak_namesMod <- dplyr::pull(speak_names, Moderator)
For the replacement value, I made a character vector of equal length to those above vectors because I know that the replacement and pattern must be equal lengths:
Misc <- rep("Misc:", 2)
Mod <- rep("Moderator:", 28)
When I run Misc through with this code, it works just fine:
atas_clean$speaker <- str_replace_all(atas_clean$speaker, speak_namesMisc, Misc)
But when I try the identical Moderator version (even if I attempt to run it before Misc), I get an error message:
atas_clean$speaker <- str_replace_all(atas_clean$speaker, speak_namesMod,
Mod)
Warning message:
In stri_replace_all_regex(string, pattern, fix_replacement(replacement), :
longer object length is not a multiple of shorter object length
I don't know why I'm getting this error because this identical function yields TRUE:
identical(length(speak_namesMod), length(Mod))
The dataframe that I'm working with is 16,244 lines long if that makes any difference to the pattern or replacement. I'm stuck and trying to find out why this isn't working and/or another solution that does not involve typing out each character element in the vectors.
Thank you!
回答1:
library('dplyr') # load the dplyr package
library('stringr') # load the stringr package
Here is a sample of my own dataset to answer your question
dput()
of my data gives
abc<-as.data.frame(
structure(list(Name = c("ME-9_ 005", "ME-9_ 004", "ME-9_ 003",
"ME-9_ 002", "ME-9_ 001", "ME-9_ 000", "ME-8_ 005", "ME-8_ 004",
"ME-8_ 003", "ME-8_ 002", "ME-8_ 001", "ME-8_ 000", "ME-7_ 005",
"ME-7_ 004", "ME-7_ 003", "ME-7_ 002", "ME-7_ 001", "ME-7_ 000"
), Mg = c(0.411058647473409, 0.361611969040526, 0.435757145931429,
0.36656632349025, 0.312782034685408, 0.357913661160629, 0.414639893651842,
0.460992875568015, 0.554803107534663, 0.418743792959099, 0.499114614445091,
0.475374442706501, 0.564660334010035, 0.502678818989733, 0.417617035801997,
0.488463005872639, 0.484776757286094, 0.424850010858818),
Al = c(0.575667101719941, 0.586351493923602, 0.574053324307634, 0.628497798862674, 0.552234153060378,
0.580547408629286, 1.05746950789483, 1.07094531357244, 1.11340157804305,
1.03043684466386, 1.02899468191215, 1.07222457991059, 1.5276908007952,
1.66549994904359, 1.43287302441973, 1.37434198093964, 1.55835986529032,
1.66902429579112),
Si = c(0.495188340689301, 0.513374456164654,
0.51809643007659, 0.569128515813393, 0.542590350648068, 0.516673370168739,
1.72437228079744, 1.59076392020817, 1.77327433861292, 1.76671780355934,
1.60625706442694, 1.92449284567535, 3.27248599245035, 3.23739024834759,
2.84115179036218, 2.51112086010829, 2.98829002803169, 2.93347114563903
),
P = c(0.222881184902066, 0.258237982165306, 0.230235867213535,
0.262379290809071, 0.230438623604524, 0.238615393939999, 0.260241811918024,
0.238785817517132, 0.248589968755681, 0.248270048794532, 0.272489046130942,
0.266707140244041, 0.25935282543278, 0.258801008935983, 0.250692297246152,
0.246890941447243, 0.277698144829677, 0.274197618349091)),
row.names = c(NA,
-18L), class = c("tbl_df", "tbl", "data.frame")))
here is how my data looked before cleaning
head(abc,10)
But for your specific question, you should do
abc$Name <- str_replace_all(
abc$Name, # column we want to search
c("001" = "","002" = "","003" = "","004" = "","005" = "","000" = "",
"-" = " ","_" = "") # each string schould be matched with a replacement
)
here is how my data looked after cleaning
head(abc,10)
I hope this helps
来源:https://stackoverflow.com/questions/50842140/replace-multiple-words-in-r-easily-str-replace-all-gives-error-that-two-objects