RegEx for capturing a repeating pattern

问题

I have the following regex from regex capturing with repeating pattern

([0-9]{1,2}h)[ ]*([0-9]{1,2}min):[ ]*(.*(?:\n(?![0-9]{1,2}h).*)*)

It takes the following string

1h 30min: Title 
- Description Line 1
1h 30min: Title
- Description Line 1
- Description Line 2
- Description Line 3

And produces this as a result

Match 1:
  "1h 30min: Title 
  - Description Line 1"

      Group 1: "1h"
      Group 2: "30min"
      Group 3: "Title 
               - Description Line 1"

Match 2:
  "1h 30min: Title 
 - Description Line 1
 - Description Line 2
 - Description Line 3"

      Group 1: "1h"
      Group 2: "30min"
      Group 3: "Title 
               - Description Line 1
               - Description Line 2
               - Description Line 3"

I now have the matching 1h 30min not always occur on a new line. So say I hade the following string

1h 30min: Title 
- Description Line 1 1h 30min: Title - Description Line 1
- Description Line 2
- Description Line 3

How can I modify the regex to get the following matched result?

Match 1:
  "1h 30min: Title 
  - Description Line 1"

      Group 1: "1h"
      Group 2: "30min"
      Group 3: "Title 
               - Description Line 1"

Match 2:
  "1h 30min: Title - Description Line 1
 - Description Line 2
 - Description Line 3"

      Group 1: "1h"
      Group 2: "30min"
      Group 3: "Title - Description Line 1
               - Description Line 2
               - Description Line 3"

I though removing the \n would do the trick but it just ends up matching everything after the first 1h 30min

回答1:

You can make this work with only minor changes, but the issue is that last part. The general form of a tempered greedy token is this:

(.(?!notAllowed))+

so, using this pattern for your case, plus adding named groups for clarity:

(?<hours>[0-9]{1,2}h)[ ]*(?<minutes>[0-9]{1,2}min):\s*(?<description>(?:.(?!\dh\s\d{1,2}min))+)

PS: if you cannot turn on a "dot matches newline" mode, you may be able to use [\s\S] to simulate.

regex101 demo

回答2:

I can't solve it with minor changes. So, I just offer my solution:

([0-9]{1,2}h) *([0-9]{1,2}min):[\s\S]*?(?=[0-9]{1,2}h|$)

回答3:

The desired output is quite difficult to match, yet not impossible.

I would do part of it, maybe the time and title part with regular expressions, if OK, then the rest with scripting.

Here, we can start with an expression similar to:

([0-9]{1,2}h)\s+([0-9]{1,2}min):\s+(Title)([\d\D]*?\d|.+)|[\s\S]*

or:

([0-9]{1,2}h)\s+([0-9]{1,2}min):\s+([A-Za-z\s]+)([\d\D]*?\d|.+)|[\s\S]*

const regex = /([0-9]{1,2}h)\s+([0-9]{1,2}min):\s+(Title)([\d\D]*?\d|.+)|[\s\S]*/gm;
const str = `1h 30min: Title 
- Description Line 1 1h 30min: Title - Description Line 1
- Description Line 2
- Description Line 3`;
let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    
    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);
    });
}

RegEx Circuit

jex.im visualizes regular expressions:

来源：https://stackoverflow.com/questions/56421027/regex-for-capturing-a-repeating-pattern

标签

regex

regex-lookarounds

regex-group

regex-greedy