How to gsub on the text between two words in R?

前端 未结 1 1042
感动是毒
感动是毒 2021-01-22 08:42

EDIT:

I would like to place a \\n before a specific unknown word in my text. I know that the first time the unknown word appears i

相关标签:
1条回答
  • 2021-01-22 08:50

    You need to find the unknown word between "Tree" and "Lake" first. You can use

    unknown_word <- gsub(".*Tree(\\w+)Lake.*", "\\1", text)
    

    The pattern matches any characters up to the last Tree in a string, then captures the unknown word (\w+ = one or more word characters) up to the Lake and then matches the rest of the string. It replaces all the strings in the vector. You can access the first one by [[1]] index.

    Then, when you know the word, replace it with

    gsub(paste0("[[:space:]]*(", unknown_word[[1]], ")[[:space:]]*"), " \n\\1 ", text)
    

    See IDEONE demo.

    Here, you have [[:space:]]*( + unknown_word[1] + )[[:space:]]* pattern. It matches zero or more whitespaces on both ends of the unknown word, and the unknown word itself (captured into Group 1). In the replacement, the spaces are shrunk into 1 (or added if there were none) and then \\1 restores the unknown word. You may replace [[:space:]] with \\s.

    UPDATE

    If you need to only add a newline symbols before RU that are whole words, use the \b word boundary:

    > gsub(paste0("[[:space:]]*\\b(", unknown_word[[1]], ")\\b[[:space:]]*"), " \n\\1 ", text)
    [1] "TreeRULakeSunWater" "A B C \nRU D"   
    
    0 讨论(0)
提交回复
热议问题