Distinguishing identifiers from common strings

风格不统一 提交于 2019-12-13 04:49:39

问题


I want to write a parser using Bison/Yacc + Lex which can parse statements like:

VARIABLE_ID = 'STRING' 

where:

ID       [a-zA-Z_][a-zA-Z0-9_]*

and:

STRING      [a-zA-Z0-9_]+

So, var1 = '123abc' is a valid statement while 1var = '123abc' isn't.

Therefore, a VARIABLE_ID is a STRING but a STRING not always is a VARIABLE_ID.

What I would like to know is if the only way to distinguish between the two is writing a checking procedure at a higher level (i.e. inside Bison code) or if I can work it out in the Lex code.


回答1:


Your abstract statement syntax is actually:

VARIABLE = STRING

and not

VARIABLE = 'STRING'

because the quote delimiters are a lexical detail that we generally want to keep out of the syntax. And so, the token patterns are actually this:

ID       [a-zA-Z_][a-zA-Z0-9_]*
STRING   '[a-zA-Z_0-9]*'

An ID is a letter or underscore, followed by any combination (including empty) of letters, digits and underscores.

A STRING is a single quote, followed by a sequence (possibly empty) letters, digits and underscores, followed by another single quote.

So the ambiguity you are concerned about does not exist. An ID is not in fact a STRING, nor vice versa.

Somewhere inside your Bison parser, or possibly in the lexer, you might want to massage the yytext of a STRING match to remove the quotes and just retain the text in between them as a string. This could be a Bison rule, possibly similar to:

string : STRING 
       {
          $$ = strip_quotes($1);
       }
       ;


来源:https://stackoverflow.com/questions/19249397/distinguishing-identifiers-from-common-strings

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!