Indentation sensitive parser using Parslet in Ruby?

前端 未结 2 872
星月不相逢
星月不相逢 2020-12-25 08:47

I am attempting to parse a simple indentation sensitive syntax using the Parslet library within Ruby.

The following is an example of the syntax I am attempting to pa

相关标签:
2条回答
  • 2020-12-25 08:59

    There are a few approaches.

    1. Parse the document by recognising each line as a collection of indents and an identifier, then apply a transformation afterwards to reconstruct the hierarchy based on the number of indents.

    2. Use captures to store the current indent and expect the next node to include that indent plus more to match as a child (I didn't dig into this approach much as the next one occurred to me)

    3. Rules are just methods. So you can define 'node' as a method, which means you can pass parameters! (as follows)

    This lets you define node(depth) in terms of node(depth+1). The problem with this approach, however, is that the node method doesn't match a string, it generates a parser. So a recursive call will never finish.

    This is why dynamic exists. It returns a parser that isn't resolved until the point it tries to match it, allowing you to now recurse without problems.

    See the following code:

    require 'parslet'
    
    class IndentationSensitiveParser < Parslet::Parser
    
      def indent(depth)
        str('  '*depth)
      end
    
      rule(:newline) { str("\n") }
    
      rule(:identifier) { match['A-Za-z0-9'].repeat(1).as(:identifier) }
    
      def node(depth) 
        indent(depth) >> 
        identifier >> 
        newline.maybe >> 
        (dynamic{|s,c| node(depth+1).repeat(0)}).as(:children)
      end 
    
      rule(:document) { node(0).repeat }
    
      root :document
    end
    

    This is my favoured solution.

    0 讨论(0)
  • 2020-12-25 09:12

    I don't like the idea of weaving knowledge of the indentation process through the whole grammar. I would rather just have INDENT and DEDENT tokens produced that other rules could use similarly to just matching "{" and "}" characters. So the following is my solution. It is a class IndentParser that any parser can extend to get nl, indent, and decent tokens generated.

    require 'parslet'
    
    # Atoms returned from a dynamic that aren't meant to match anything.
    class AlwaysMatch < Parslet::Atoms::Base
      def try(source, context, consume_all)
        succ("")
      end
    end
    class NeverMatch < Parslet::Atoms::Base
      attr_accessor :msg
      def initialize(msg = "ignore")
        self.msg = msg
      end
      def try(source, context, consume_all)
        context.err(self, source, msg)
      end
    end
    class ErrorMatch < Parslet::Atoms::Base
      attr_accessor :msg
      def initialize(msg)
        self.msg = msg
      end
      def try(source, context, consume_all)
        context.err(self, source, msg)
      end
    end
    
    class IndentParser < Parslet::Parser
    
      ##
      # Indentation handling: when matching a newline we check the following indentation. If
      # that indicates an indent token or detent tokens (1+) then we stick these in a class
      # variable and the high-priority indent/dedent rules will match as long as these 
      # remain. The nl rule consumes the indentation itself.
    
      rule(:indent)  { dynamic {|s,c| 
        if @indent.nil?
          NeverMatch.new("Not an indent")
        else
          @indent = nil
          AlwaysMatch.new
        end
      }}
      rule(:dedent)  { dynamic {|s,c|
        if @dedents.nil? or @dedents.length == 0
          NeverMatch.new("Not a dedent")
        else
          @dedents.pop
          AlwaysMatch.new
        end
      }}
    
      def checkIndentation(source, ctx)
        # See if next line starts with indentation. If so, consume it and then process
        # whether it is an indent or some number of dedents.
        indent = ""
        while source.matches?(Regexp.new("[ \t]"))
          indent += source.consume(1).to_s #returns a Slice
        end
    
        if @indentStack.nil?
          @indentStack = [""]
        end
    
        currentInd = @indentStack[-1]
        return AlwaysMatch.new if currentInd == indent #no change, just match nl
    
        if indent.start_with?(currentInd)
          # Getting deeper
          @indentStack << indent
          @indent = indent #tells the indent rule to match one
          return AlwaysMatch.new
        else
          # Either some number of de-dents or an error
    
          # Find first match starting from back
          count = 0
          @indentStack.reverse.each do |level|
            break if indent == level #found it, 
    
            if level.start_with?(indent)
              # New indent is prefix, so we de-dented this level.
              count += 1
              next
            end
    
            # Not a match, not a valid prefix. So an error!
            return ErrorMatch.new("Mismatched indentation level")
          end
    
          @dedents = [] if @dedents.nil?
          count.times { @dedents << @indentStack.pop }
          return AlwaysMatch.new
        end
      end
      rule(:nl)         { anynl >> dynamic {|source, ctx| checkIndentation(source,ctx) }}
    
      rule(:unixnl)     { str("\n") }
      rule(:macnl)      { str("\r") }
      rule(:winnl)      { str("\r\n") }
      rule(:anynl)      { unixnl | macnl | winnl }
    
    end
    

    I'm sure a lot can be improved, but this is what I've come up with so far.

    Example usage:

    class MyParser < IndentParser
      rule(:colon)      { str(':') >> space? }
    
      rule(:space)      { match(' \t').repeat(1) }
      rule(:space?)     { space.maybe }
    
      rule(:number)     { match['0-9'].repeat(1).as(:num) >> space? }
      rule(:identifier) { match['a-zA-Z'] >> match["a-zA-Z0-9"].repeat(0) }
    
      rule(:block)      { colon >> nl >> indent >> stmt.repeat.as(:stmts) >> dedent }
      rule(:stmt)       { identifier.as(:id) >> nl | number.as(:num) >> nl | testblock }
      rule(:testblock)  { identifier.as(:name) >> block }
    
      rule(:prgm)       { testblock >> nl.repeat }
      root :prgm
    end
    
    0 讨论(0)
提交回复
热议问题