Exclude everything after the second occurrence of a certain string

后端 未结 2 1764
清酒与你
清酒与你 2021-01-18 07:39

I have the following string

string <- c(\'a - b - c - d\',
            \'z - c - b\',
            \'y\',
            \'u - z\')

I would

相关标签:
2条回答
  • 2021-01-18 07:49

    try this (\w(?:\s+-\s+\w)?).*. For the explanation of the regex look this https://regex101.com/r/BbfsNQ/2.

    That regex will retrieve the first tuple if exists or just the first caracter if there's not a tuple. So, the data is get into a "capturing group". Then to display the captured groups, it depends on the used language but in pure regex that will be \1 to get the first group (\2 to get second etc...). Look at the part "Substitution" on the regex101 if you wan't a graphic example.

    0 讨论(0)
  • 2021-01-18 08:00

    Note that you cannot use a negated character class to negate a sequence of characters. [^ - ]*$ matches any 0+ chars other than a space (yes, it matches -, too, because the - created a range between a space and a space) followed by the end of the string marker ($).

    You may use a sub function with the following regex:

    ^(.*? - .*?) - .*
    

    to replace with \1. See the regex demo.

    R code:

    > string <- c('a - b - c - d', 'z - c - b', 'y', 'u - z')
    > sub("^(.*? - .*?) - .*", "\\1", string)
    [1] "a - b" "z - c" "y"     "u - z"
    

    Details:

    • ^ - start of a string
    • (.*? - .*?) - Group 1 (referred to with the \1 backreference in the replacement pattern) capturing any 0+ chars lazily up to the first space, hyphen, space and then again any 0+ chars up to the next leftmost occurrence of space, hyphen, space
    • - - a space, hyphen and a space
    • .* - any zero or more chars up to the end of the string.
    0 讨论(0)
提交回复
热议问题