R - Replace multiple patterns with multiple ids

雨燕双飞 提交于 2021-02-17 06:22:08

问题


This was partially already tackled in others posts but unfortunately I could not make it run properly on my side.

I have a data frame full of texts, and there are certain words that I want replaced by a unique name.

So, if we see the table bellow, I would want to replace each of the words "Banana Apple Tomato" by the word "Fruit" (the word Fruit can show up multiple times, that is ok) I also want to replace "Cod Pork Beef" by the word "Animals"

I have a full excel file where this mapping was done. The excel file has two columns. On column A, we have the unique name (like Fruit and Animals). On column B, we have the words that we want to replace on the text (Like Banana, Apple, Tomato).

The code I came up was:

hous <- read.table(header = TRUE, 
                   stringsAsFactors = FALSE, 
                   text="HouseType HouseTypeNo
'Banana Apple Tomato Honey' 'Onion Garlic Pepper Sugar'
'Cod Pork Beef' 'Mushrooms Soya Eggs' ")

maps <- read.table(header = TRUE, 
                           stringsAsFactors = FALSE, 
                           text="UniqueID Names
'Fruit' 'Banana'
'Fruit' 'Apple'
'Fruit' 'Tomato'
'Animals' 'Cod'
'Animals' 'Pork'
'Animals' 'Beef'")

hous %>% str_replace_all( pattern = maps$Names, replacement = maps$UniqueID)
*#Warning message:
In stri_replace_all_regex(string, pattern, fix_replacement(replacement),  :
  argument is not an atomic vector; coercing*

I cannot make it work. I basically just wanna look up for certain words, and replace them with some unique ids. It doesn't sound complicated, but I cannot make it run.

Just a few points: in my real data set I have thousands of words and IDs. I have seen in other examples people writing their ids, patters and replacements by hand. In my case that is not applicable.

The final output would be something like this:

hous <- read.table(header = TRUE, 
                   stringsAsFactors = FALSE, 
                   text="HouseType HouseTypeNo
'Fruit Fruit Fruit Honey' 'Onion Garlic Pepper Sugar'
'Animal Animal Animal' 'Mushrooms Soya Eggs' ")

Any help is appreciated.

Best regards


回答1:


You can create a named list and use it to replace values in str_replace_all :

hous$HouseType <- stringr::str_replace_all(hous$HouseType, 
                            setNames(maps$UniqueID, maps$Names))
hous

#                HouseType               HouseTypeNo
#1 Fruit Fruit Fruit Honey Onion Garlic Pepper Sugar
#2 Animals Animals Animals       Mushrooms Soya Eggs


来源:https://stackoverflow.com/questions/63992944/r-replace-multiple-patterns-with-multiple-ids

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!