Algorithm to create n-th level of nested patterns in RegEx

后端 未结 1 1281
-上瘾入骨i
-上瘾入骨i 2021-01-14 19:08

As explained in Can regular expressions be used to match nested patterns?, it is not possible to create regex to match arbitrary nested pattern. But is it possible to create

相关标签:
1条回答
  • 2021-01-14 19:27

    You can think about it more theoretically: a match for parenthesis nested n deep is just parenthesis around matches for n-1 or less deep (with at least one exactly n-1 deep).

    We can give a recursive definition of the regexes. Let X[n] be the regex for nesting exactly n levels, and Y[n] be the regex for a string containing brackets with any level of nesting up to n levels, so:

    X[n] = \( (Y[n-2] X[n-1])+ Y[n-2] \)
    
    Y[n] = [^()]* ( \( Y[n-1] \) [^()]* )*
    

    with Y[0] = X[0] = [^()]* (no nesting) and X[1] = \([^()]*\). (I'm not bothering with the details of non-capturing groups etc yet, and the spaces are just for readability.)

    Writing an algorithm based on this should be quite easy.


    The regexes from these new (less mutually recursive) definitions get longer much much more slowly (they are polynomial rather than exponential).

    Let l[n] be the length of X[n], and L[n] be the length of Y[n], then (the constant terms are just the hardcoded characters in each one):

    L[n] = 19 + L[n-1] = 19*n + L[0] = 19*n + 6
    
    l[n] = 3 + L[n-2] + l[n-1] + 2 + L[n-2] + 2
         = 7 + 2 * L[n-2] + l[n-1]
         = -57 + 38 * n + l[n-1]
    

    with the appropriate initial conditions for l[0] and l[1]. Recurrence relations of this form have quadratic solutions, so this is only O(n^2). Much better.

    (For others, I had a previous definition of Y[n] was Y[n] = Y[n-1] | X[n]; this extra recursion meant that the length of the X regex was O(2.41^n), which sucks a lot.)

    (The new definition of Y is a tantalising hint that there might even be a way of writing X that is linear in n. I don't know though, and I have a feeling the extra restriction on X of exact length means it is impossible.)


    The following is some Python code that computes the regexes above, you can probably translate it to javascript without too much trouble.

    # abbreviation for the No Parenthesis regex
    np = "[^()]*"
    
    # compute Y[n] from Y[n-1]
    def next_y(y_n1):
        return np + "(?:\(" + y_n1 + "\)" + np + ")*"
    
    # compute X[n] from X[n-1] and Y[n-2]
    def next_x(x_n1, y_n2):
        return "\((?:" + y_n2 + x_n1 + ")+" + y_n2 + "\)"
    
    # compute [X[n], Y[n], Y[n-1]]
    # (to allow us to make just one recursive call at each step)
    def XY(n):
        if n == 0:
            return [np, # X[0]
                    np, # Y[0]
                    ""] # unused
        elif n == 1:
            return ["\([^()]*\)", # X[1]
                    next_y(np),   # Y[1]
                    np]           # Y[0]
    
        x_n1, y_n1, y_n2 = XY(n-1) # X[n-1], Y[n-1], Y[n-2]
    
        return [next_x(x_n1, y_n2), # X[n]
                next_y(y_n1),       # Y[n]
                y_n1]               # Y[n-1]
    
    # wrapper around XY to compute just X[n]
    def X(n):
        return XY(n)[0]
    
    # wrapper around XY to compute just Y[n]
    def Y(n):
        return XY(n)[1]
    
    0 讨论(0)
提交回复
热议问题