How to match a regex with backreference in Go?

前端 未结 4 1796
北恋
北恋 2020-11-29 09:27

I need to match a regex that uses backreferences (e.g. \\1) in my Go code.

That\'s not so easy because in Go, the official regexp package uses the RE2 engine, one th

相关标签:
4条回答
  • 2020-11-29 10:00

    regexp package funcs FindSubmatchIndex and Expand can capture content by backreferences. It isn't very convenient, but it is still possible. Example

    0 讨论(0)
  • 2020-11-29 10:01

    When I had the same problem, I solved it using a two-step regular expression match. The original code is:

    if m := match(pkgname, `^(.*)\$\{DISTNAME:S(.)(\\^?)([^:]*)(\\$?)\2([^:]*)\2(g?)\}(.*)$`); m != nil {
        before, _, left, from, right, to, mod, after := m[1], m[2], m[3], m[4], m[5], m[6], m[7], m[8]
        // ...
    }
    

    The code is supposed to parse a string of the form ${DISTNAME:S|from|to|g}, which itself is a little pattern language using the familiar substitution syntax S|replace|with|.

    The two-stage code looks like this:

    if m, before, sep, subst, after := match4(pkgname, `^(.*)\$\{DISTNAME:S(.)([^\\}:]+)\}(.*)$`); m {
        qsep := regexp.QuoteMeta(sep)
        if m, left, from, right, to, mod := match5(subst, `^(\^?)([^:]*)(\$?)`+qsep+`([^:]*)`+qsep+`(g?)$`); m {
            // ...
        }
    }
    

    The match, match4 and match5 are my own wrapper around the regexp package, and they cache the compiled regular expressions so that at least the compilation time is not wasted.

    0 讨论(0)
  • 2020-11-29 10:11

    Regular Expressions are great for working with regular grammars, but if your grammar isn't regular (i.e. requires back-references and stuff like that) you should probably switch to a better tool. There are a lot of good tools available for parsing context-free grammars, including yacc which is shipped with the Go distribution by default. Alternatively, you can also write your own parser. Recursive descent parsers can be easily written by hand for example.

    I think regular expressions are overused in scripting languages (like Perl, Python, Ruby, ...) because their C/ASM powered implementation is usually more optimized than those languages itself, but Go isn't such a language. Regular expressions are usually quite slow and are often not suited for the problem at all.

    0 讨论(0)
  • 2020-11-29 10:13

    Answering my own question here, I solved this using golang-pkg-pcre, it uses libpcre++, perl regexes that do support backreferences. The API is not the same.

    0 讨论(0)
提交回复
热议问题