问题
I have a string that looks like the following:
x <- "01(01)121210(01)0001"
I want to split this into a vector so that i get the following:
[1] "0" "1" "(01)" "1" "2" "1" "2" "1" "0" "(01)" "0" "0" "0" "1"
The (|) could be [|] or {|} and the number of digits between the brackets can be 2 or more.
I've been trying to do this by separating on the brackets first:
unlist(strsplit(x, "(?<=[\\]\\)\\}])", perl=T))
[1] "01(01)" "121210(01)" "0001"
or unlist(strsplit(x, "(?<=[\\[\\(\\{])", perl=T))
[1] "01(" "01)121210(" "01)0001"
but I can't find a way to combine the two together. Then, I was hoping to split the elements not containing the brackets.
I'd be really grateful if someone can help me out with this or know of a more elegant way to do this.
Many thanks!
回答1:
This is another way:
unlist(strsplit(x, '\\([^)]*\\)(*SKIP)(*F)|(?=)', perl=T))
# [1] "0" "1" "(01)" "1" "2" "1" "2" "1" "0" "(01)" "0" "0" "0" "1"
\\([^)]*\\)
matches anything in parentheses, and (*SKIP)(*F)
tells the regular expression engine to fail on this pattern and if it finds that pattern in the string, do not re-test that part of the string using the alternative pattern on the other side of the |
. The pattern on the other side of the |
is (?=)
, and this matches the space between characters.
回答2:
Just change the PERL option to TRUE and split the input string based on the below pattern.
(?<!\(|^)(?!\)|\d\)|$)
DEMO
R regex would be,
"(?<!\\(|^)(?!\\)|\\d\\)|$)"
回答3:
An other possible way:
unlist(strsplit(x, '(?!\\(?\\d*\\))', perl=T))
Shorter but, less efficient than Matthew Plourde way
or a way like G. Grothendieck wrotes:
m<-gregexpr("\\d|\\([^)]*\\)", x)
regmatches(x, m)
回答4:
This can be done without zero width look ahead/behind expressions using strapply
in the gsubfn package. The regular expression matches a digit or a ( until the next ).
library(gsubfn)
strapply(x, "\\d|\\(.*?\\)", c, perl = TRUE)[[1]]
giving:
[1] "0" "1" "(01)" "1" "2" "1" "2" "1" "0" "(01)"
[11] "0" "0" "0" "1"
Note: In the example shown in the question the part inside (...) is always two digits. If that is always the case it can be simplified further to:
strapplyc(x, "\\d|\\(...")[[1]]
UPDATE Added note.
来源:https://stackoverflow.com/questions/25160197/r-strsplit-before-and-after-keeping-both-delimiters