问题
For the life of me, I am unable to strip out some escape characters from a text string (prior to further processing). I've tried stringi, gsub, but I just cannot get the correct syntax.
Here is my text string
txt <- "c(\"\\r\\n Stuff from a webpage: That I scraped using webcrawler\\r\\n\", \"\\r\\n \", \"\\r\\n \", \"\\r\\n \", \"\\r\\n\\r\\n \", \"\\r\\n\\r\\n \", \"\\r\\n \\r\\n \", \"\\r\\n \")"
I'd like to strip out "\\r\\n" from this string.
I've tried
gsub("[\\\r\\\n]", "", txt) (leaves me with "rn")
gsub("[\\r\\n]", "", txt) (leaves me without ANY r or n in the text)
gsub("[\r\n]", "", txt) (strips nothing)
How can I remove these characters? Bear in mind that this will need to work over other entries that may have normal words ending in "rn" or have "rn" in the middle somewhere!
Thanks!
回答1:
Not very pretty, but this works:
library(stringr)
str_remove_all(txt, "(?<=\\\\n)\\s+|\\s+(?=\\\")|\\\"|(?<=\\\"),|\\\\r(?=\\\\n)|(?<=\\\\r)\\\\n")
[1] "c(Stuff from a webpage: That I scraped using webcrawler)"
I'm sure there are more efficient regex solutions, but I just fed it every possibility of things you don't want.
I also got rid of all the extra "\", ",", and white space.
If you just want to match the result that you posted above:
str_remove_all(txt, "\\\\r(?=\\\\n)|(?<=\\\\r)\\\\n")
This reads remove any instance of \\r
followed by \\n
or any \\n
preceded by \\r
回答2:
At the risk of answering my own question too quickly, I've found a bodge workaround which simply involves switching out the "\" for a rare place holder, "__", then replacing that:
gsub('__r__n', '', gsub('[\\\\]', '__', txt))
... but it would be valuable I think to share a better "one hit" solution.
来源:https://stackoverflow.com/questions/51384784/how-to-replace-r-n-characters-in-a-text-string-specifically-in-r