问题
This was partially already tackled in others posts but unfortunately I could not make it run properly on my side.
I have a data frame full of texts, and there are certain words that I want replaced by a unique name.
So, if we see the table bellow, I would want to replace each of the words "Banana Apple Tomato" by the word "Fruit" (the word Fruit can show up multiple times, that is ok) I also want to replace "Cod Pork Beef" by the word "Animals"
I have a full excel file where this mapping was done. The excel file has two columns. On column A, we have the unique name (like Fruit and Animals). On column B, we have the words that we want to replace on the text (Like Banana, Apple, Tomato).
The code I came up was:
hous <- read.table(header = TRUE,
stringsAsFactors = FALSE,
text="HouseType HouseTypeNo
'Banana Apple Tomato Honey' 'Onion Garlic Pepper Sugar'
'Cod Pork Beef' 'Mushrooms Soya Eggs' ")
maps <- read.table(header = TRUE,
stringsAsFactors = FALSE,
text="UniqueID Names
'Fruit' 'Banana'
'Fruit' 'Apple'
'Fruit' 'Tomato'
'Animals' 'Cod'
'Animals' 'Pork'
'Animals' 'Beef'")
hous %>% str_replace_all( pattern = maps$Names, replacement = maps$UniqueID)
*#Warning message:
In stri_replace_all_regex(string, pattern, fix_replacement(replacement), :
argument is not an atomic vector; coercing*
I cannot make it work. I basically just wanna look up for certain words, and replace them with some unique ids. It doesn't sound complicated, but I cannot make it run.
Just a few points: in my real data set I have thousands of words and IDs. I have seen in other examples people writing their ids, patters and replacements by hand. In my case that is not applicable.
The final output would be something like this:
hous <- read.table(header = TRUE,
stringsAsFactors = FALSE,
text="HouseType HouseTypeNo
'Fruit Fruit Fruit Honey' 'Onion Garlic Pepper Sugar'
'Animal Animal Animal' 'Mushrooms Soya Eggs' ")
Any help is appreciated.
Best regards
回答1:
You can create a named list and use it to replace values in str_replace_all
:
hous$HouseType <- stringr::str_replace_all(hous$HouseType,
setNames(maps$UniqueID, maps$Names))
hous
# HouseType HouseTypeNo
#1 Fruit Fruit Fruit Honey Onion Garlic Pepper Sugar
#2 Animals Animals Animals Mushrooms Soya Eggs
来源:https://stackoverflow.com/questions/63992944/r-replace-multiple-patterns-with-multiple-ids