问题
Disclaimer before this is auto-closed. This is NOT the same as this:
How do you access the matched groups in a JavaScript regular expression?
Let's say I have this regular expression:
const regex = /(\w+) count: (\d+)/
Is there a way I can extract the capture groups so that I have:
[ '\w+', '\d+' ]`
回答1:
As others pointed out you'd need a real parser, such as Lex & Yacc. You can however use regex and some recursion magic to parse nested structures. See details at https://twiki.org/cgi-bin/view/Blog/BlogEntry201109x3
Here is a JavaScript version that can parse nested groups properly. The default test is (\w+) count: (\d+), number: (-?\d+(\/\d+)?)
, e.g. three groups at level 0, and one group nested at level 1 in the third group:
// configuration:
const ctrlChar = '~'; // use non-printable, such as '\x01'
const cleanRegex = new RegExp(ctrlChar + '\\d+' + ctrlChar, 'g');
function parseRegex(str) {
function _levelRegx(level) {
return new RegExp('(' + ctrlChar + level + ctrlChar + ')\\((.*?)(' + ctrlChar + level + ctrlChar + ')\\)', 'g');
}
function _extractGroup(m, p1, p2, p3) {
//console.log('m: ' + m + ', p1: ' + p1 + ', p2: ' + p2 + ', p3: ' + p3);
groups.push(p2.replace(cleanRegex, ''));
let nextLevel = parseInt(p1.replace(/\D/g, ''), 10) + 1;
p2 = p2.replace(_levelRegx(nextLevel), _extractGroup);
return '(' + p2 + ')';
}
// annotate parenthesis with proper nesting level:
let level = 0;
str = str.replace(/(?<!\\)[\(\)]/g, function(m) {
if(m === '(') {
return ctrlChar + (level++) + ctrlChar + m;
} else {
return ctrlChar + (--level) + ctrlChar + m;
}
});
console.log('nesting: ' + str);
// recursively extract groups:
let groups = [];
level = 0;
str = str.replace(_levelRegx(level), _extractGroup);
console.log('result: ' + str);
console.log('groups: [ \'' + groups.join('\', \'') + '\' ]');
$('#regexGroups').text(JSON.stringify(groups, null, ' '));
}
$('document').ready(function() {
let str = $('#regexInput').val();
parseRegex(str);
$('#regexInput').on('input', function() {
let str = $(this).val();
parseRegex(str);
});
});
div, input {
font-family: monospace;
}
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.0/jquery.min.js"></script>
<div>
<p>Regex: <input id="regexInput" value="(\w+) count: (\d+), number: (-?\d+(\/\d+)?)" size="60" />
<p>Groups: <span id="regexGroups"></span></p>
<p>.<br />.<br />.</p>
</div>
You can try it out with various nested patterns.
Explanation:
- step 1: annotate opening and closing parenthesis with proper nesting level:
- the annotation is done with control character
~
- in real live use a non-printable char to avoid collision
- the result for
(\w+)
is~0~(\w+~0~)
- the result of the default input is
~0~(\w+~0~) count: ~0~(\d+~0~), number: ~0~(-?\d+~1~(\/\d+~1~)?~0~)
- the annotation is done with control character
- step 2: recursively extract groups:
- we start with level 0, and extract all groups at that level
- for each matched group we recursively extract all groups at that next level
来源:https://stackoverflow.com/questions/66127686/how-to-parse-a-regular-expression