In regex, match either the end of the string or a specific character

前端 未结 2 1107
渐次进展
渐次进展 2020-11-22 07:19

I have a string. The end is different, such as index.php?test=1&list=UL or index.php?list=UL&more=1. The one thing I\'m looking for is

相关标签:
2条回答
  • 2020-11-22 07:45

    Use:

    /(&|\?)list=.*?(&|$)/
    

    Note that when you use a bracket expression, every character within it (with some exceptions) is going to be interpreted literally. In other words, [&|$] matches the characters &, |, and $.

    0 讨论(0)
  • 2020-11-22 08:03

    In short

    Any zero-width assertions inside [...] lose there meaning of a zero-width assertion. [\b] does not match a word boundary (it matches a backspace, or, in POSIX, \ or b), [$] matches a literal $ char, [^] is either an error or, as in ECMAScript regex flavor, any char. Same with \z, \Z, \A anchors.

    You may solve the problem using any of the below patterns:

    [&?]list=([^&]*)
    [&?]list=(.*?)(?=&|$)
    [&?]list=(.*?)(?![^&])
    

    Matching between a char sequence and a single char or end of string (current scenario)

    The .*?([YOUR_SINGLE_CHAR_DELIMITER(S)]|$) pattern (suggested by João Silva) is rather inefficient since the regex engine checks for the patterns that appear to the right of the lazy dot pattern first, and only if they do not match does it "expand" the lazy dot pattern.

    In these cases it is recommended to use negated character class (or bracket expression in the POSIX talk):

    [&?]list=([^&]*)
    

    See demo. Details

    • [&?] - a positive character class matching either & or ? (note the relationships between chars/char ranges in a character class are OR relationships)
    • list= - a substring, char sequence
    • ([^&]*) - Capturing group #1: zero or more (*) chars other than & ([^&]), as many as possible

    Checking for the trailing single char delimiter presence without returning it or end of string

    Most regex flavors (including JavaScript beginning with ECMAScript 2018) support lookarounds, constructs that only return true or false if there patterns match or not. They are crucial in case consecutive matches that may start and end with the same char are expected (see the original pattern, it may match a string starting and ending with &). Although it is not expected in a query string, it is a common scenario.

    In that case, you can use two approaches:

    • A positive lookahead with an alternation containing positive character class: (?=[SINGLE_CHAR_DELIMITER(S)]|$)
    • A negative lookahead with just a negative character class: (?![^SINGLE_CHAR_DELIMITER(S)])

    The negative lookahead solution is a bit more efficient because it does not contain an alternation group that adds complexity to matching procedure. The OP solution would look like

    [&?]list=(.*?)(?=&|$)
    

    or

    [&?]list=(.*?)(?![^&])
    

    See this regex demo and another one here.

    Certainly, in case the trailing delimiters are multichar sequences, only a positive lookahead solution will work since [^yes] does not negate a sequence of chars, but the chars inside the class (i.e. [^yes] matches any char but y, e and s).

    0 讨论(0)
提交回复
热议问题