问题
I am currently building a parser by hand. It is a LL(1) parser. At the moment, it is a great recognizer: its function parse(List tokens) decides whether or not tokens is a member of the language or not.
Now, I want to build the corresponding AST for that input. However, I know how to implement it in a recursive descent way (already did it). That is, for the challenge, I implement my stack using a stack with the classical algorithm:
next <- first token of the input
stack <- START_SYMBOL
do {
top <- stack.pop()
if (top is a terminal and top == next) {
next <- next token of the input
} else if (top is a non terminal and PARSING_TABLE[top, next] exists) {
stack.push(PARSING_TABLE[top, next]);
} else {
return invalid input;
}
} while (stack is not empty);
return valid input;
where the PARSING_TABLE is the LL(1) table. However, I wonder how to implement the part which build the AST in such a configuration. I do not ask for complete implementation, more for implementation idea.
Thanks !
回答1:
Your stack can be annotated so that it contains the AST entry reference (i.e. rule number + position in rule + target data where to store) + (terminal/non terminal)
Your initial stack <- START_SYMBOL
is annotated to store its result in the AST root.
Basically, your pop()
selects the current AST construct. Then the next <- next token
saves the value in your AST. The stack.push(PARSING_TABLE[top, next]);
opens a new AST list and writes it in the construct corresponding to top
, and generates in each entry of the stack the 'rule number + position + target list'.
When you parsing is finished, you have the entire tree.
In a precise AST, you might want to ignore some tokens. This can be done via appropriate annotations in the stack set during the push() part. The typical way is to attach to each of your rules (A -> B C) some meta information, for example, what is to be kept and what is the nature of the result.
回答2:
The difficulty arises because the common method of replacing a nonterminal on the stack with the rhs of its matched-rule effectively forgets the grammatical structure at the moment it's predicted. But to generate an AST you need that structure later when a rule-parse is completed.
Rather than replacing a nonterminal with the rhs symbols of its matching rule, leave it in place and push the matched symbols as a list object. This way the stack retains the hierarchial structure of the grammar.
Parsing consumes symbols in the topmost list. The exhaustion of a list corresponds to the completion of a rule. A nonterminal is removed from the stack when its rule is completed, not when it is predicted.
As the stack is consumed, build a corollary AST structure that remembers the relevant rule and stores the parsed tokens. Thus the stack acts like a predicted AST that flows into the parsed AST.
You can think of this as emulating the call hierarchy of a recursive-descent parser with the stack of symbol-lists as a stack of call-frames.
In ruby:
# g is the grammar; a list of rules
# s is a terminal sequence to parse
# note, this code does not tokenize EOF
def parse(g, s)
tab = gen_table(g)
stack = [[g.start_sym]]
# intermediate ast node format: [rule-action, symbols...]
ast = [[->(rhs){[:_S, rhs[0]]}]]
loop do
puts "PARSE\n #{s}\n #{stack}\n #{ast}"
if stack.first.empty?
raise "extraneous input" if not s.empty?
break
end
if stack.last.empty? # rule complete
stack.pop
node = ast.pop
# transform the node (eg to a class) using the associated rule-action
node = node.first.(node.drop(1))
ast.last.push(node)
stack.last.shift # rm sym from stack after completing it
next
end
raise "incomplete input" if s.empty?
tok = s.first
topsym = stack.last.first
if topsym.is_a? String # terminal
raise "mismatch #{tok} != #{topsym}" if tok != topsym
stack.last.shift
ast.last.push(s.shift)
elsif topsym.is_a? Symbol # nonterminal
ri = tab[topsym][tok]
raise "no rule for #{topsym}, #{tok}" if ri.nil?
stack.push(g[ri].rhs.clone)
ast.push([g[ri].action])
end
end
node = ast.first
node.first.(node.drop(1))
end
来源:https://stackoverflow.com/questions/20153208/ll1-parser-implemented-with-stack-how-to-build-ast