ECMAScript Regex for a multilined string

别说谁变了你拦得住时间么 提交于 2019-12-02 17:59:50

问题


I am writing the loading procedure for my application and it involves reading data from a file and creating an appropriate object with appropriate properties.

The file consists of sequential entries (separated by a newline) in the following format:

=== OBJECT TYPE ===
<Property 1>: Value1
<Property 2>: Value2
=== END OBJECT TYPE ===

Where the values are often strings which may consist of arbitrary characters, new-lines, etc.

I want to create a std::regex which can match this format and allow me to use std::regex_iterator to read each of the objects into the file in turn.

However, I am having trouble creating a regex which matches this type of format; I have looked at the ECMAScript syntax and create my regex in the following way, but it does not match the string in my test application:

const std::regex regexTest( "=== ([^=]+) ===\\n([.\\n]*)\\n=== END \\1 ===" );

And when using this in the following test application, it fails to match the regex to the string:

int main()
{
    std::string testString = "=== TEST ===\n<Random Example>:This is a =test=\n<Another Example>:Another Test||\n=== END TEST ===";

    std::cout << testString << std::endl;

    const std::regex regexTest( "=== ([^=]+) ===\\n([.\\n]*)\\n=== END \\1 ===" );
    std::smatch regexMatch;

    if( std::regex_match( testString, regexMatch, regexTest ) )
    {
        std::cout << "Prefix: \"" << regexMatch[1] << "\"" << std::endl;
        std::cout << "Main Body: \"" << regexMatch[2] << "\"" << std::endl;
    }

    return 0;
}

回答1:


Your problem is quite simpler than it looks. This:

const std::regex regexTest( "=== ([^=]+) ===\\n((?:.|\\n)*)\\n=== END \\1 ===" );

worked perfectly on clang++/libc++. It seems that \n does not fit into [] brackets in ECMAscript regexen. Remember to use while regex_search instead of if regex_match if you want to look for more than one instance of the regex inside the string!




回答2:


Try to use:

  1. lazy quantifiers:

    === (.+?) ===\\n([\\s\\S]*?)\\n=== END \\1 ===

  2. negative classes and negative lookaheads:

    === ((?:[^ ]+| (?!===))+) ===\\n((?:[^\\n]+|\\n(?!=== END \\1 ===))*)

  3. POSIX:

    === (.+?) ===\n((.|\n)*?)\n=== END [^=]+? ===



来源:https://stackoverflow.com/questions/17133296/ecmascript-regex-for-a-multilined-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!