How to match only those numbers which have an even number of `%`s preceding them?

与世无争的帅哥 提交于 2019-12-10 04:24:36

问题


I want to catch numbers appearing anywhere in a string, and replace them with "(.+)".

But I want to catch only those numbers which have an even number of %s preceding them. No worries if any surrounding chars get caught up: we can use capture groups to filter out the numbers.

I'm unable to come up with an ECMAscript regular expression.

Here is the playground:

abcd %1 %%2 %%%3 %%%%4 efgh

abcd%12%%34%%%666%%%%11efgh

A successful catch will behave like this:


Things I have tried:


If you have realised, the third attempt is almost working. The only problems are in the second line of playground. Actually, what I wanted to say in that expression is:

Match a number if it is preceded by an even number of %s AND either of the following is true:

  • The above whole expression is preceded by nothing [absence of (unconsumed or otherwise) character].
  • The above whole expression is preceded by a character other than %.

Is there a way to match the absence of a character?
That's what I was trying to do by using \0 in the third attempt.


回答1:


You can use (?:[^%\d]|^|\b(?=%))(?:%%)*(\d+) as a pattern, where your number is stored into the first capturing group. This also treats numbers preceded by zero %-characters.

This will match the even number of %-signs, if they are preceded by:

  • neither % nor number (so we don't need to catch the last number before a %, as this wouldn't work with chains like %%1%%2)
  • the start of the string
  • a word boundary (thus any word character), for the chains mentioned above

You can see it in action here




回答2:


Issue

You want a regex with a negative infinite-width lookbehind:

(?<=(^|[^%])(?:%%)*)\d+

Here is the .NET regex demo

In ES7, it is not supported, you need to use language-specific means and a simplified regex to match any number of % before a digit sequence: /(%*)(\d+)/g and then check inside the replace callback if the number of percentage signs is even or not and proceed accordingly.

JavaScript

Instead of trying to emulate a variable-width lookbehind, you may just use JS means:

var re = /(%*)(\d+)/g;          // Capture into Group 1 zero or more percentage signs
var str = 'abcd %1 %%2 %%%3 %%%%4 efgh<br/><br/>abcd%12%%34%%%666%%%%11efgh';
var res = str.replace(re, function(m, g1, g2) { // Use a callback inside replace
  return (g1.length % 2 === 0) ? g1 + '(.+)' : m; // If the length of the %s is even
});                             // Return Group 1 + (.+), else return the whole match
document.body.innerHTML = res;

If there must be at least 2 % before digits, use /(%+)(\d+)/g regex pattern where %+ matches at least 1 (or more) percentage signs.

Conversion to C++

The same algorithm can be used in C++. The only problem is that there is no built-in support for a callback method inside the std::regex_replace. It can be added manually, and used like this:

#include <iostream>
#include <cstdlib>
#include <string>
#include <regex>
using namespace std;

template<class BidirIt, class Traits, class CharT, class UnaryFunction>
std::basic_string<CharT> regex_replace(BidirIt first, BidirIt last,
    const std::basic_regex<CharT,Traits>& re, UnaryFunction f)
{
    std::basic_string<CharT> s;

    typename std::match_results<BidirIt>::difference_type
        positionOfLastMatch = 0;
    auto endOfLastMatch = first;

    auto callback = [&](const std::match_results<BidirIt>& match)
    {
        auto positionOfThisMatch = match.position(0);
        auto diff = positionOfThisMatch - positionOfLastMatch;

        auto startOfThisMatch = endOfLastMatch;
        std::advance(startOfThisMatch, diff);

        s.append(endOfLastMatch, startOfThisMatch);
        s.append(f(match));

        auto lengthOfMatch = match.length(0);

        positionOfLastMatch = positionOfThisMatch + lengthOfMatch;

        endOfLastMatch = startOfThisMatch;
        std::advance(endOfLastMatch, lengthOfMatch);
    };

    std::sregex_iterator begin(first, last, re), end;
    std::for_each(begin, end, callback);

    s.append(endOfLastMatch, last);

    return s;
}

template<class Traits, class CharT, class UnaryFunction>
std::string regex_replace(const std::string& s,
    const std::basic_regex<CharT,Traits>& re, UnaryFunction f)
{
    return regex_replace(s.cbegin(), s.cend(), re, f);
}

std::string my_callback(const std::smatch& m) {
  if (m.str(1).length() % 2 == 0) {
    return m.str(1) + "(.+)";
  } else {
    return m.str(0);
  }
}

int main() {
    std::string s = "abcd %1 %%2 %%%3 %%%%4 efgh\n\nabcd%12%%34%%%666%%%%11efgh";
    cout << regex_replace(s, regex("(%*)(\\d+)"), my_callback) << endl;

    return 0;
}

See the IDEONE demo.

Special thanks for the callback code goes to John Martin.




回答3:


I don't know ECMAScript but following documentation has the answer:

ECMAScript regex

Search for negative lookahead, which will result in something like this:

(?!%)(([%]{2})*\d+)

...where (?!%) means not preceded by % literal.



来源:https://stackoverflow.com/questions/38291499/how-to-match-only-those-numbers-which-have-an-even-number-of-s-preceding-them

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!