How to get capturing group functionality in Go regular expressions

前端 未结 5 965
闹比i
闹比i 2020-12-04 21:47

I\'m porting a library from Ruby to Go, and have just discovered that regular expressions in Ruby are not compatible with Go (google RE2). It\'s come to my attention that Ru

相关标签:
5条回答
  • 2020-12-04 22:00

    how should I re-write these expressions?

    Add some Ps, as defined here:

    (?P<Year>\d{4})-(?P<Month>\d{2})-(?P<Day>\d{2})
    

    Cross reference capture group names with re.SubexpNames().

    And use as follows:

    package main
    
    import (
        "fmt"
        "regexp"
    )
    
    func main() {
        r := regexp.MustCompile(`(?P<Year>\d{4})-(?P<Month>\d{2})-(?P<Day>\d{2})`)
        fmt.Printf("%#v\n", r.FindStringSubmatch(`2015-05-27`))
        fmt.Printf("%#v\n", r.SubexpNames())
    }
    
    0 讨论(0)
  • 2020-12-04 22:04

    I had created a function for handling url expressions but it suits your needs too. You can check this snippet but it simply works like this:

    /**
     * Parses url with the given regular expression and returns the 
     * group values defined in the expression.
     *
     */
    func getParams(regEx, url string) (paramsMap map[string]string) {
    
        var compRegEx = regexp.MustCompile(regEx)
        match := compRegEx.FindStringSubmatch(url)
    
        paramsMap = make(map[string]string)
        for i, name := range compRegEx.SubexpNames() {
            if i > 0 && i <= len(match) {
                paramsMap[name] = match[i]
            }
        }
        return
    }
    

    You can use this function like:

    params := getParams(`(?P<Year>\d{4})-(?P<Month>\d{2})-(?P<Day>\d{2})`, `2015-05-27`)
    fmt.Println(params)
    

    and the output will be:

    map[Year:2015 Month:05 Day:27]
    
    0 讨论(0)
  • 2020-12-04 22:04

    To improve RAM and CPU usage without calling anonymous functions inside loop and without copying arrays in memory inside loop with "append" function see the next example:

    You can store more than one subgroup with multiline text, without appending string with '+' and without using for loop inside for loop (like other examples posted here).

    txt := `2001-01-20
    2009-03-22
    2018-02-25
    2018-06-07`
    
    regex := *regexp.MustCompile(`(?s)(\d{4})-(\d{2})-(\d{2})`)
    res := regex.FindAllStringSubmatch(txt, -1)
    for i := range res {
        //like Java: match.group(1), match.gropu(2), etc
        fmt.Printf("year: %s, month: %s, day: %s\n", res[i][1], res[i][2], res[i][3])
    }
    

    Output:

    year: 2001, month: 01, day: 20
    year: 2009, month: 03, day: 22
    year: 2018, month: 02, day: 25
    year: 2018, month: 06, day: 07
    

    Note: res[i][0] =~ match.group(0) Java

    If you want to store this information use a struct type:

    type date struct {
      y,m,d int
    }
    ...
    func main() {
       ...
       dates := make([]date, 0, len(res))
       for ... {
          dates[index] = date{y: res[index][1], m: res[index][2], d: res[index][3]}
       }
    }
    

    It's better to use anonymous groups (performance improvement)

    Using "ReplaceAllGroupFunc" posted on Github is bad idea because:

    1. is using loop inside loop
    2. is using anonymous function call inside loop
    3. has a lot of code
    4. is using the "append" function inside loop and that's bad. Every time a call is made to "append" function, is copying the array to new memory position
    0 讨论(0)
  • 2020-12-04 22:14

    If you need to replace based on a function while capturing groups you can use this:

    import "regexp"
    
    func ReplaceAllGroupFunc(re *regexp.Regexp, str string, repl func([]string) string) string {
        result := ""
        lastIndex := 0
    
        for _, v := range re.FindAllSubmatchIndex([]byte(str), -1) {
            groups := []string{}
            for i := 0; i < len(v); i += 2 {
                groups = append(groups, str[v[i]:v[i+1]])
            }
    
            result += str[lastIndex:v[0]] + repl(groups)
            lastIndex = v[1]
        }
    
        return result + str[lastIndex:]
    }
    

    Example:

    str := "abc foo:bar def baz:qux ghi"
    re := regexp.MustCompile("([a-z]+):([a-z]+)")
    result := ReplaceAllGroupFunc(re, str, func(groups []string) string {
        return groups[1] + "." + groups[2]
    })
    fmt.Printf("'%s'\n", result)
    

    https://gist.github.com/elliotchance/d419395aa776d632d897

    0 讨论(0)
  • 2020-12-04 22:15

    Simple way to determine group names based on @VasileM answer.

    Disclaimer: it's not about memory/cpu/time optimization

    package main
    
    import (
        "fmt"
        "regexp"
    )
    
    func main() {
        r := regexp.MustCompile(`^(?P<Year>\d{4})-(?P<Month>\d{2})-(?P<Day>\d{2})$`)
    
        res := r.FindStringSubmatch(`2015-05-27`)
        names := r.SubexpNames()
        for i, _ := range res {
            if i != 0 {
                fmt.Println(names[i], res[i])
            }
        }
    }
    

    https://play.golang.org/p/Y9cIVhMa2pU

    0 讨论(0)
提交回复
热议问题