问题
I need to clean up some data strings that have words and numbers or just numbers.
below is a toy sample
library(tidyverse)
c("555","Word 123", "two words 123", "three words here 123") %>%
sub("(\\w+) (\\d*)", "\\1|\\2", .)
The result is this:
[1] "555" "Word|123" "two|words 123" "three|words here 123"
but I want to place the '|' before the last set of numbers like shown below
[1] "|555" "Word|123" "two words|123" "three words here|123"
回答1:
We can use sub
to match zero or more spaces (\\s*
) followed by a digit we capture as a group ((\\d)
) and in the replacement use the |
followed by the backreference (\\1
) of the captured group
sub("\\s*(\\d)", "|\\1", v1)
#[1] "|555" "Word|123"
#[3] "two words|123" "three words here|123"
data
v1 <- c("555","Word 123", "two words 123", "three words here 123")
回答2:
You may use
^(.*?)\s*(\d*)$
Replace with \1|\2
. See the regex demo.
In R:
sub("^(.*?)\\s*(\\d*)$", "\\1|\\2", .)
Details
^
- start of string(.*?)
- Capturing group 1: any 0+ chars, as few as possible\s*
- zero or more whitespaces(\d*)
- Capturing group 2: zero or more digits$
- end of string.
来源:https://stackoverflow.com/questions/55856172/r-separate-words-from-numbers-in-string