std::regex, to match begin/end of string

前端 未结 4 1339
误落风尘
误落风尘 2020-12-31 05:46

In JS regular expressions symbols ^ and $ designate start and end of the string. And only with /m modifier (multiline

相关标签:
4条回答
  • 2020-12-31 06:22

    The following code snippet matches email addresses starting [a-z] followed by 0 or 1 dot, then by 0 or more a-z letters, then ending with "@gmail.com". I tested it.

    string reg = "^[a-z]+\\.*[a-z]*@gmail\\.com$";
    
    regex reg1(reg, regex_constants::icase);
    reg1(regex_str, regex_constants::icase);
    string email;
    cin>>email;
    if (regex_search(email, reg1))
    
    0 讨论(0)
  • 2020-12-31 06:23

    You can emulate Perl/Python/PCRE \A, which matches at beginning of string but not after a newline, with the Javascript regex ^(?<!(.|\n)]), which translates to English as "match the beginning of a line which has no preceding character".

    You can emulate Perl/Python/PCRE \z, which matches only at end-of-string, using (?!(.|\n))$. To get the effect of \Z, which matches only at end-of-string but allows a single newline just before that end-of-string, just add an optional newline: \n?(?!(.|\n))$.

    0 讨论(0)
  • 2020-12-31 06:39

    TL;DR

    • MSVC: the ^ and $ already match start and end of lines
    • C++17: use std::regex_constants::multiline option
    • Other compilers only match start of string with ^ and end of string with $ with no a possibility to redefine their behavior.

    In all std::regex implementations other than MSVC and before C++17, the ^ and $ match beginning and end of the string, not a line. See this demo that does not find any match in "1\n2\n3" with ^\d+$ regex. When you add alternations (see below), there are 3 matches.

    However, in MSVC and C++17, the ^ and $ may match start/end of the line.

    C++17

    Use the std::regex_constants::multiline option.

    MSVC compiler

    In a C++ project in Visual Studio, the following

    std::regex r("^\\d+$");
    std::string st("1\n2\n3");
    for (std::sregex_iterator i = std::sregex_iterator(st.begin(), st.end(), r);
        i != std::sregex_iterator();
        ++i)
    {
        std::smatch m = *i;
        std::cout << "Match value: " << m.str() << " at Position " << m.position() << '\n';
    }
    

    will output

    Match value: 1 at Position 0
    Match value: 2 at Position 2
    Match value: 3 at Position 4
    

    Workarounds that work across C++ compilers

    There is no universal option in std::regex to make the anchors match start/end of the line across all compilers. You need to emulate it with alternations:

    ^ -> (^|\n)
    $ -> (?=\n|$)
    

    Note that $ can be "emulated" fully with (?=\n|$) (where you may add more line terminator symbols or symbol sequences, like (?=\r?\n|\r|$)), but with ^, you cannot find a 100% workaround.

    Since there is no lookbehind support, you might have to adjust other parts of your regex pattern because of (^|\n) like using capturing groups more often than you could with a lookbehind support.

    0 讨论(0)
  • 2020-12-31 06:39

    By default, ECMAscript mode already treats ^ as both beginning-of-input and beginning-of-line, and $ as both end-of-input and end-of-line. There is no way to make them match only beginning or end-of-input, but it is possible to make them match only beginning or end-of-line:

    When invoking std::regex_match, std::regex_search, or std::regex_replace, there is an argument of type std::regex_constants::match_flag_type that defaults to std::regex_constants::match_default.

    • To specify that ^ matches only beginning-of-line, specify std::regex_constants::match_not_bol
    • To specify that $ matches only end-of-line, specify std::regex_constants::match_not_eol
    • As these values are bitflags, to specify both, simply bitwise-or them together (std::regex_constants::match_not_bol | std::regex_constants::match_not_eol)
    • Note that beginning-of-input can be implied without using ^ and regardless of the presence of std::regex_constants::match_not_bol by specifying std::regex_constants::match_continuous

    This is explained well in the ECMAScript grammar documentation on cppreference.com, which I highly recommend over cplusplus.com in general.

    Caveat: I've tested with MSVC, Clang + libc++, and Clang + libstdc++, and only MSVC has the correct behavior at present.

    0 讨论(0)
提交回复
热议问题