Regular Expressions: How to get the effect of an AND THEN operator in compound expression?

前端 未结 3 1667
梦毁少年i
梦毁少年i 2021-01-23 09:11

I\'m struggling to work with regular expressions. I think I understand the individual expressions but combining something together has me completely stumped. I don\'t grasp th

相关标签:
3条回答
  • 2021-01-23 09:27

    You've almost got it. It's really as simple as replacing 'or' with | and replacing and with concatenation. Then make sure your groups are unmatching by adding ?: to the beginning of each:

    (?:<|<\/)(?:[1-9]|[1-4][0-9]|[5][0-7])>

    MDN has an explanation on the interaction of split and regex. But the short example-explanation is:

    'hi_joe'.split('_'); // ['hi', 'joe']
    'hi_joe'.split(/_/); // ['hi', 'joe']
    'hi_joe'.split(/(_)/); // ['hi', '_', 'joe']
    'hi_joe'.split(/(?:_)/); // ['hi', 'joe']
    

    Update per comment, if you'd like the <##> in your results array as well, wrap the regex in an additional set of parens.

    ((?:<|<\/)(?:[1-9]|[1-4][0-9]|[5][0-7])>)

    0 讨论(0)
  • 2021-01-23 09:36

    Ok, let's start with the number thingy. It's fine, except there's technically no need to bracket a single symbol [5]

     [1-9] | [1-4][0-9] | 5[0-7]
    

    (using spaces here and below for clarity).

    For the first part, an alteration like a | ab reads better when written as ab?, that is, "a, and then, optionally, b`. That gives us

     < \/ ?
    

    Now, the "and" (or rather "and then") operator you were looking for, is very simple in the regex language - it's nothing. That is, a and then b is just ab.

    However, if we combine both parts simply like this

    a  x | y | z
    

    that would be a mistake, because | has low priority, so that would be interpreted as

    ax | y | z
    

    which is not what we want. So we need to put the number thing in parens, for the reasons that will be explained below, these parens also have to be non-capturing:

    <\/?  (?: [1-9] | [1-4][0-9] | 5[0-7] )
    

    This matches our delimiters, but we also need everything in between, so we're going to split the input. split normally returns an array of strings that do not match the delimiter:

    "a,b,c".split(/,/) => a b c
    

    If we want to include the delimiter too, it has to be placed in a capturing group:

    "a,b,c".split(/(,)/) => a , b , c
    

    so we have to wrap everything in parens once again:

    (  <\/?  (?: [1-9] | [1-4][0-9] | 5[0-7] )  )
    

    and that's the reason for ?: - we want the whole thing to be captured, but not the number part.

    Putting it all together seems to do the trick:

    s = "This is a<21>test</21>."
    
    
    console.log(s.split(/(<\/?(?:[1-9]|[1-4][0-9]|5[0-7])>)/))

    Hope this sheds some light

    0 讨论(0)
  • 2021-01-23 09:37

    The way I understand regex is that, unless specified otherwise intentionally e.g. an OR clause, everything you define as a regex is in the form of an AND. [a-z] will match one character, whereas [a-z][a-z] will match one character AND another character.

    Depending on your use case the regex below could be what you need. As you can see it captures everything between <number></number>.

    <[1-5][0-9]>([\s\S]*?)<\/[1-5][0-9]>
    
    <[1-5][0-9]> matches <number> where number is between 00 and 59.
    [\s\S]*? matches every single character there is, including new lines, between zero and unlimited times.
    </[1-5][0-9]> matches </number> where number is between 00 and 59.
    

    Here is a snippet returning everything between <number></number>. It converts the matches to an array and gets the first capture group of the first match. The first capture group being everything between <number></number> as you can see by the parenthesis in the regex itself.

    let str = '<10>Hello, world!</10>';
    
    let reg = /<[1-5][0-9]>([\s\S]*?)<\/[1-5][0-9]>/g;
    
    let matches = Array.from( str.matchAll(reg) );
    
    console.log(matches[0][1]);

    0 讨论(0)
提交回复
热议问题