I need to match a regex that uses backreferences (e.g. \\1) in my Go code.
That\'s not so easy because in Go, the official regexp package uses the RE2 engine, one th
regexp package funcs FindSubmatchIndex and Expand can capture content by backreferences. It isn't very convenient, but it is still possible. Example
When I had the same problem, I solved it using a two-step regular expression match. The original code is:
if m := match(pkgname, `^(.*)\$\{DISTNAME:S(.)(\\^?)([^:]*)(\\$?)\2([^:]*)\2(g?)\}(.*)$`); m != nil {
before, _, left, from, right, to, mod, after := m[1], m[2], m[3], m[4], m[5], m[6], m[7], m[8]
// ...
}
The code is supposed to parse a string of the form ${DISTNAME:S|from|to|g}
, which itself is a little pattern language using the familiar substitution syntax S|replace|with|
.
The two-stage code looks like this:
if m, before, sep, subst, after := match4(pkgname, `^(.*)\$\{DISTNAME:S(.)([^\\}:]+)\}(.*)$`); m {
qsep := regexp.QuoteMeta(sep)
if m, left, from, right, to, mod := match5(subst, `^(\^?)([^:]*)(\$?)`+qsep+`([^:]*)`+qsep+`(g?)$`); m {
// ...
}
}
The match
, match4
and match5
are my own wrapper around the regexp
package, and they cache the compiled regular expressions so that at least the compilation time is not wasted.
Regular Expressions are great for working with regular grammars, but if your grammar isn't regular (i.e. requires back-references and stuff like that) you should probably switch to a better tool. There are a lot of good tools available for parsing context-free grammars, including yacc which is shipped with the Go distribution by default. Alternatively, you can also write your own parser. Recursive descent parsers can be easily written by hand for example.
I think regular expressions are overused in scripting languages (like Perl, Python, Ruby, ...) because their C/ASM powered implementation is usually more optimized than those languages itself, but Go isn't such a language. Regular expressions are usually quite slow and are often not suited for the problem at all.
Answering my own question here, I solved this using golang-pkg-pcre, it uses libpcre++, perl regexes that do support backreferences. The API is not the same.