I have the following string
string <- c(\'a - b - c - d\',
\'z - c - b\',
\'y\',
\'u - z\')
I would
try this (\w(?:\s+-\s+\w)?).*
. For the explanation of the regex look this https://regex101.com/r/BbfsNQ/2.
That regex will retrieve the first tuple if exists or just the first caracter if there's not a tuple. So, the data is get into a "capturing group". Then to display the captured groups, it depends on the used language but in pure regex that will be \1
to get the first group (\2
to get second etc...). Look at the part "Substitution" on the regex101 if you wan't a graphic example.
Note that you cannot use a negated character class to negate a sequence of characters. [^ - ]*$
matches any 0+ chars other than a space (yes, it matches -
, too, because the -
created a range between a space and a space) followed by the end of the string marker ($
).
You may use a sub
function with the following regex:
^(.*? - .*?) - .*
to replace with \1
. See the regex demo.
R code:
> string <- c('a - b - c - d', 'z - c - b', 'y', 'u - z')
> sub("^(.*? - .*?) - .*", "\\1", string)
[1] "a - b" "z - c" "y" "u - z"
Details:
^
- start of a string(.*? - .*?)
- Group 1 (referred to with the \1
backreference in the replacement pattern) capturing any 0+ chars lazily up to the first space, hyphen, space and then again any 0+ chars up to the next leftmost occurrence of space, hyphen, space -
- a space, hyphen and a space.*
- any zero or more chars up to the end of the string.