EDIT:
I would like to place a \\n
before a specific unknown word in my text. I know that the first time the unknown word appears i
You need to find the unknown word between "Tree" and "Lake" first. You can use
unknown_word <- gsub(".*Tree(\\w+)Lake.*", "\\1", text)
The pattern matches any characters up to the last Tree
in a string, then captures the unknown word (\w+
= one or more word characters) up to the Lake
and then matches the rest of the string. It replaces all the strings in the vector. You can access the first one by [[1]]
index.
Then, when you know the word, replace it with
gsub(paste0("[[:space:]]*(", unknown_word[[1]], ")[[:space:]]*"), " \n\\1 ", text)
See IDEONE demo.
Here, you have [[:space:]]*(
+ unknown_word[1] + )[[:space:]]*
pattern. It matches zero or more whitespaces on both ends of the unknown word, and the unknown word itself (captured into Group 1). In the replacement, the spaces are shrunk into 1 (or added if there were none) and then \\1
restores the unknown word. You may replace [[:space:]]
with \\s
.
UPDATE
If you need to only add a newline symbols before RU
that are whole words, use the \b
word boundary:
> gsub(paste0("[[:space:]]*\\b(", unknown_word[[1]], ")\\b[[:space:]]*"), " \n\\1 ", text)
[1] "TreeRULakeSunWater" "A B C \nRU D"