How to match “anything up until this sequence of characters” in a regular expression?

后端 未结 12 2154
旧时难觅i
旧时难觅i 2020-11-22 11:51

Take this regular expression: /^[^abc]/. This will match any single character at the beginning of a string, except a, b, or c.

If you add a *

相关标签:
12条回答
  • 2020-11-22 12:15

    As @Jared Ng and @Issun pointed out, the key to solve this kind of RegEx like "matching everything up to a certain word or substring" or "matching everything after a certain word or substring" is called "lookaround" zero-length assertions. Read more about them here.

    In your particular case, it can be solved by a positive look ahead: .+?(?=abc)

    A picture is worth a thousand words. See the detail explanation in the screenshot.

    0 讨论(0)
  • 2020-11-22 12:16

    The $ marks the end of a string, so something like this should work: [[^abc]*]$ where you're looking for anything NOT ENDING in any iteration of abc, but it would have to be at the end

    Also if you're using a scripting language with regex (like php or js), they have a search function that stops when it first encounters a pattern (and you can specify start from the left or start from the right, or with php, you can do an implode to mirror the string).

    0 讨论(0)
  • 2020-11-22 12:19

    On python:

    .+?(?=abc) works for the single line case.

    [^]+?(?=abc) does not work, since python doesn't recognize [^] as valid regex. To make multiline matching work, you'll need to use the re.DOTALL option, for example:

    re.findall('.+?(?=abc)', data, re.DOTALL)
    
    0 讨论(0)
  • 2020-11-22 12:19

    try this

    .+?efg
    

    Query :

    select REGEXP_REPLACE ('abcdefghijklmn','.+?efg', '') FROM dual;
    

    output :

    hijklmn
    
    0 讨论(0)
  • 2020-11-22 12:20

    I believe you need subexpressions. If I remember right you can use the normal () brackets for subexpressions.

    This part is From grep manual:

     Back References and Subexpressions
           The back-reference \n, where n is a single digit, matches the substring
           previously matched  by  the  nth  parenthesized  subexpression  of  the
           regular expression.
    

    Do something like ^[^(abc)] should do the trick.

    0 讨论(0)
  • 2020-11-22 12:23

    If you're looking to capture everything up to "abc":

    /^(.*?)abc/
    

    Explanation:

    ( ) capture the expression inside the parentheses for access using $1, $2, etc.

    ^ match start of line

    .* match anything, ? non-greedily (match the minimum number of characters required) - [1]

    [1] The reason why this is needed is that otherwise, in the following string:

    whatever whatever something abc something abc
    

    by default, regexes are greedy, meaning it will match as much as possible. Therefore /^.*abc/ would match "whatever whatever something abc something ". Adding the non-greedy quantifier ? makes the regex only match "whatever whatever something ".

    0 讨论(0)
提交回复
热议问题