Regex with named capture groups getting all matches in Ruby

前端 未结 10 1956
滥情空心
滥情空心 2021-02-02 08:12

I have a string:

s=\"123--abc,123--abc,123--abc\"

I tried using Ruby 1.9\'s new feature \"named groups\" to fetch all named group info:

相关标签:
10条回答
  • 2021-02-02 09:16

    You can extract the used variables from the regexp using names method. So what I did is, I used regular scan method to get the matches, then zipped names and every match to create a Hash.

    class String
      def scan2(regexp)
        names = regexp.names
        scan(regexp).collect do |match|
          Hash[names.zip(match)]
        end
      end
    end
    

    Usage:

    >> "aaa http://www.google.com.tr aaa https://www.yahoo.com.tr ffffd".scan2 /(?<url>(?<protocol>https?):\/\/[\S]+)/
    => [{"url"=>"http://www.google.com.tr", "protocol"=>"http"}, {"url"=>"https://www.yahoo.com.tr", "protocol"=>"https"}]
    
    0 讨论(0)
  • 2021-02-02 09:17

    Named captures are suitable only for one matching result.
    Ruby's analogue of findall is String#scan. You can either use scan result as an array, or pass a block to it:

    irb> s = "123--abc,123--abc,123--abc"
    => "123--abc,123--abc,123--abc"
    
    irb> s.scan(/(\d*)--([a-z]*)/)
    => [["123", "abc"], ["123", "abc"], ["123", "abc"]]
    
    irb> s.scan(/(\d*)--([a-z]*)/) do |number, chars|
    irb*     p [number,chars]
    irb> end
    ["123", "abc"]
    ["123", "abc"]
    ["123", "abc"]
    => "123--abc,123--abc,123--abc"
    
    0 讨论(0)
  • 2021-02-02 09:18

    I needed something similar recently. This should work like String#scan, but return an array of MatchData objects instead.

    class String
      # This method will return an array of MatchData's rather than the
      # array of strings returned by the vanilla `scan`.
      def match_all(regex)
        match_str = self
        match_datas = []
        while match_str.length > 0 do 
          md = match_str.match(regex)
          break unless md
          match_datas << md
          match_str = md.post_match
        end
        return match_datas
      end
    end
    

    Running your sample data in the REPL results in the following:

    > "123--abc,123--abc,123--abc".match_all(/(?<number>\d*)--(?<chars>[a-z]*)/)
    => [#<MatchData "123--abc" number:"123" chars:"abc">,
        #<MatchData "123--abc" number:"123" chars:"abc">,
        #<MatchData "123--abc" number:"123" chars:"abc">]
    

    You may also find my test code useful:

    describe String do
      describe :match_all do
        it "it works like scan, but uses MatchData objects instead of arrays and strings" do
          mds = "ABC-123, DEF-456, GHI-098".match_all(/(?<word>[A-Z]+)-(?<number>[0-9]+)/)
          mds[0][:word].should   == "ABC"
          mds[0][:number].should == "123"
          mds[1][:word].should   == "DEF"
          mds[1][:number].should == "456"
          mds[2][:word].should   == "GHI"
          mds[2][:number].should == "098"
        end
      end
    end
    
    0 讨论(0)
  • 2021-02-02 09:18

    Piggybacking off of Mark Hubbart's answer, I added the following monkey-patch:

    class ::Regexp
      def match_all(str)
        matches = []
        str.scan(self) { matches << $~ }
    
        matches
      end
    end
    

    which can be used as /(?<letter>\w)/.match_all('word'), and returns:

    [#<MatchData "w" letter:"w">, #<MatchData "o" letter:"o">, #<MatchData "r" letter:"r">, #<MatchData "d" letter:"d">]

    This relies on, as others have said, the use of $~ in the scan block for the match data.

    0 讨论(0)
提交回复
热议问题