问题
Following is an example of a list of multiline records, each starting with a fixed string label (LABEL
):
<Irrelevant line>
...
<Irrelevant line>
LABEL ...
...
...
LABEL ...
...
...
LABEL ...
...
...
LABEL ...
...
...
Is there a Java regular expression that can much the above and extract each record, i.e.
LABEL ...
...
...
Also, is this the fastest way of extracting those records, or reading line-by-line and checking the start of the string would yield faster results?
回答1:
To iterate over all the LABEL
groups, use this:
Pattern regex = Pattern.compile("(?sm)LABEL.*?(?=^LABEL|\\Z)");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
// the current LABEL group: regexMatcher.group()
}
See the demo for the various matches.
Explanation
(?s)
activatesDOTALL
mode, allowing the dot to match across lines(?m)
turns on multi-line mode, allowing^
and$
to match on each lineLABEL
matches literal characters.*?
lazily matches all chars up to...- the point where the lookahead
(?=^LABEL|\\Z)
can assert that what follows is the nextLABEL
or the end of the string
回答2:
I think you can start with the expression:
^LABEL\s*\w*
OR
^LABEL.*
It may need some improvements but you can at least start with it.
回答3:
The below would match all the lines which starts with the string LABEL
,
(?=^LABEL).*
DEMO
回答4:
In my point of view you can iterate stream per line and check if the line starts with "LABEL".
I think you can use "substring" method like
line.substring(0,"LABLEL".length());//you need add more checks to improve code security
In my point of view Regural Expressions are very useful to find pattern no a specific text.
来源:https://stackoverflow.com/questions/24605556/java-regex-to-match-multiline-records-starting-with-fixed-label