I have a string that's mixed letters and numbers:
"The sample is 22mg"
I'd like to split strings where a number is immediately followed by letter like this:
"The sample is 22 mg"
I've tried this:
gsub('[0-9]+[[aA-zZ]]', '[0-9]+ [[aA-zZ]]', 'This is a test 22mg')
but am not getting the desired results.
Any suggestions?
You need to use capturing parentheses in the regular expression and group references in the replacement. For example:
gsub('([0-9])([[:alpha:]])', '\\1 \\2', 'This is a test 22mg')
There's nothing R-specific here; the R help for regex
and gsub
should be of some use.
You need backreferencing:
test <- "The sample is 22mg"
> gsub("([0-9])([a-zA-Z])","\\1 \\2",test)
[1] "The sample is 22 mg"
Anything in parentheses gets remembered. Then they're accessed by \1 (for the first entity in parens), \2, etc. The first backslash escapes the backslash's interpretation in R so that it gets passed to the regular expression parser.
来源:https://stackoverflow.com/questions/11605564/r-regex-gsub-separate-letters-and-numbers