How does backtracking affect the language recognized by a parser?

≯℡__Kan透↙ 提交于 2021-02-18 22:26:49

问题


Not a school related question, but it comes up in the Dragon Book (Compilers: Principles, Techniques, and Tools) in an exercise:

The grammar:

S ::= aSa | aa

generates all even length strings of a's except for the empty string.

a) Construct a recursive-descent parser with backtracking for this grammar that tries the alternative aSa before aa. Show that the procedure for S succeeds on 2, 4, or 8 a's, but fails on 6 a's. b) What language does your parser recognize?

I'm stumped. It seems like if 4 a's is recognized as S, and two a's between an S is recognized, then 6 a's should be recognized. I tried implementing the parser in C but this one recognizes all even numbers of a's as well. It's not failing to recognize 6 a's. What does this exercise have in mind?

/* A C implementation of Exercise 4.13 in the Dragon Book */

/* The grammar:

   S ::= aSa | aa

*/

/* Construct a recursive-descent parser with backtracking for this grammar 
   that tries the alternative aSa before aa. Show that the procedure for S 
   succeeds on 2, 4, or 8 a's, but fails on 6 a's. 
*/

#include <string.h>
#include <stdio.h>

int S(const char *str, int start, int end);
int aSa(const char *str, int start, int end);
int aa(const char *str, int start, int end);

/* returns 1 if a match, 0 otherwise */
int S(const char *str, int start, int end)
{
  if(aSa(str, start, end))
    return 1;
  else
    if(aa(str, start, end))
      return 1;
  return 0;
}

/* returns 1 if a match, 0 otherwise */
int aSa(const char *str, int start, int end)
{
  int len = end - start;
  if (len < 3)
    return 0;
  if(str[0] != 'a')
    return 0;
  if (!S(str, start+1, end-1))
    return 0;
  if(str[len-1] != 'a')
    return 0;
  return 1;
}

/* returns 1 if a match, 0 otherwise */
int aa(const char *str, int start, int end)
{
  int len = end - start;
  if(len != 2)
    return 0;
  if(str[0] == str[1] && str[0] == 'a')
    return 1;
  return 0;
}

int main()
{
  char str[20];
  printf("Enter a string: \n");
  scanf("%s", str);
  int match = S(str, 0, strlen(str));
  if(match)
    printf("The string %s matches\n", str);
  else
    printf("The string %s does not match\n", str);
  return 0;
}

回答1:


Even with backtracking, which requires being able to rewind the input stream, a recursive descent parser is not allowed to look ahead to the end of the input, nor is it allowed to remove symbols from both ends of the stream.

A left-to-right parser must be able to work with an input stream which has only one method:

get() : consume and read one symbol, or return an EOF symbol.

The backtracking version needs a stream with two more methods:

posn = tell()  : return an opaque value which can be used in seek()
seek(posn)     : reposition the stream to a previous position returned by tell()



回答2:


The problem isn't the fact that this is a backtracking or recursive descent parser; the problem is that the described implementation does not properly consider the outer context of the recursive descent parse. This is similar to the difference between a Strong LL (SLL) parser and an LL parser.

The shortest input for which the strange behavior is demonstrated is aaaaaa.

  1. We start in rule S, and match the 1sta.
  2. We invoke S.
    • We match the 2nda.
    • We invoke S. I'll omit the specific steps, but the key is this invocation of S matches aaaa, which is the 3rda through the end of the input. (See note that follows.)
    • We try to match a, but since the end of the input was already reached, we go back and match just the 2nd through 3rdaa.
  3. We match the 4tha.

Additional note about the inner call to S that matched aaaa: If we knew to reserve an a at the end of the input for step 3, then the inner call to S could have matched aa instead of aaaa, leading to a successful parse of the complete input aaaaaa. ANTLR 4 provides this "full context" parsing ability in a recursive descent parser, and is the first recursive descent LL parser able to correctly match aa instead of aaaa for this nested invocation of S.

An SLL parser matches a2k for this grammar. A proper LL parser (such as ANTLR 4) matches a2k for this grammar.




回答3:


there's no way i'm going to write this in c for fun, but here's the parser written in python, as simple as i can make it (i hope it's clear as pseudocode, even if you don't know this language):

class Backtrack(Exception): pass

def asa(input):
    if input[0:1] == 'a':
        parsed, remaining = s(input[1:])
        if remaining[0:1] == 'a':
            return 'a' + parsed + 'a', remaining[1:]
    raise Backtrack

def aa(input):
    if input[0:2] == 'aa':
        return 'aa', input[2:]
    raise Backtrack

def s(input):
    try:
        return asa(input)
    except Backtrack:
        return aa(input)

for i in range(17):
    print(i, ': ', end='')
    try:
        print(s('a' * i))
    except Backtrack:
        print('failed')

and the results as length: (parsed, remaining):

0 : failed
1 : failed
2 : ('aa', '')
3 : ('aa', 'a')
4 : ('aaaa', '')
5 : ('aa', 'aaa')
6 : ('aaaa', 'aa')
7 : ('aaaaaa', 'a')
8 : ('aaaaaaaa', '')
9 : ('aa', 'aaaaaaa')
10 : ('aaaa', 'aaaaaa')
11 : ('aaaaaa', 'aaaaa')
12 : ('aaaaaaaa', 'aaaa')
13 : ('aaaaaaaaaa', 'aaa')
14 : ('aaaaaaaaaaaa', 'aa')
15 : ('aaaaaaaaaaaaaa', 'a')
16 : ('aaaaaaaaaaaaaaaa', '')

which i suspect will help you understand. the short answer is that recursive descent is a very specific, limited thing. it's not a complete search.

(it's a good question really. makes an important point. good book.)




回答4:


The analysis procedure of aa

The analysis procedure of aaaa

The analysis procedure of aaaaaa

The analysis procedure of aaaaaaaa

The recursive descent parser backtracks only when errors occurred. It miss the situation that the success was "temporary".



来源:https://stackoverflow.com/questions/17456994/how-does-backtracking-affect-the-language-recognized-by-a-parser

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!