How to test ANTLR translation without adding EOF to every rule

為{幸葍}努か 提交于 2021-01-28 05:52:12

问题


I am in the middle of re-writing my translator and I am being much more disciplined about tests this time, since this version is likely to live for more than a few weeks.

Because you can run a visitor starting at any node, you can almost write beautiful small tests like this ...

expect(parse("some test code", "startGrammarRule")).toEqual(new ASTForGrammarRule())

and then write one ( or a few of these ) for each visitor function

EXCEPT that the rule you are invoking is a sub rule, and so does not have "EOF" in it, so if my grammar has somewhere in it

numberList: NUMBER ( ',' NUMBER )* ;

... then parse("1,2,3", "numberList") only parses "1" (because it is only an "EOF" which would make the parser hungry enough to consume all the string).

Editing the rule to add EOF is a non starter. I could, for every rule I write a test for, add a test version of the rule ...

numberList: NUMBER ( ',' NUMBER )* ;
numberList_TEST: numberList EOF ;

... but that is going to make the grammar cluttered and introduce worry that the _TEST rules have to always be maintained scrupulously ...

I want a flag when I create a parser which constructs that faux TEST rule dynamically and then parses from there, or something like that ...

Is there a better way to write tests for my parser that I haven't figured out yet?


回答1:


In a Java project, I'm using a custom matcher to check if the parsed tokens are 100% of the tokenstream, and if not, will fail.

You seem to use the TypeScript target, so in TypeScript that could look like this:

T.g4

grammar T;

parse      : numberList EOF;
numberList : NUMBER ( ',' NUMBER )*;

NUMBER : [0-9]+;
ID     : [a-zA-Z]+;
WS     : [ \t\r\n]+ -> channel(HIDDEN);

parserMatchers.ts

import { TLexer } from '../src/parser/TLexer';
import { BailErrorStrategy, CharStreams, CommonTokenStream } from 'antlr4ts';
import { TParser } from '../src/parser/TParser';
import { Lexer } from 'antlr4ts/Lexer';

expect.extend({
  toBeCompletelyParsedBy: (source: string, ruleName: string) => {
    const lexer = new TLexer(CharStreams.fromString(source));
    lexer.removeErrorListeners();
    const tokenStream = new CommonTokenStream(lexer);
    const parser = new TParser(tokenStream);
    parser.removeErrorListeners();
    parser.errorHandler = new BailErrorStrategy();
    const context = parser[ruleName]();

    // Collect the real tokens: non-HIDDEN and non-EOF tokens
    const realTokens = tokenStream.getTokens().filter((t) =>
      t.channel === Lexer.DEFAULT_TOKEN_CHANNEL && t.type !== Lexer.EOF);

    let indexOfStop = realTokens.indexOf(context.stop);
    let pass = realTokens.length === (indexOfStop + 1);

    let message = () => {

      if (pass) {
        return `Expected '${source}' not to be completely parsed by rule '${ruleName}', but it did.`;
      }

      let offending = realTokens[indexOfStop + 1];

      return `Expected '${source}' to be completely parsed by rule '${ruleName}', but '${offending.text}' ` +
        `(${offending.line}:${offending.charPositionInLine}) was not included!`;
    };

    return { pass, message };
  }
});

declare global {
  namespace jest {
    interface Matchers<R> {
      toBeCompletelyParsedBy(ruleName: string): R
    }
  }
}

export {};

And in you unit tests, you can now do this:

import './parserMatchers';

test('the numberList parser rule', () => {
  expect('3, 4, 5').toBeCompletelyParsedBy('numberList');
  expect('3, 4, 5 FOO').not.toBeCompletelyParsedBy('numberList');
});


来源:https://stackoverflow.com/questions/65689300/how-to-test-antlr-translation-without-adding-eof-to-every-rule

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!