Partial grammar for counting class count

社会主义新天地 提交于 2020-01-04 02:27:18

问题


I need to count the number of classes in correct C# source file. I wrote the following grammar:

grammar CSharpClassGrammar;

options
{
        language=CSharp2;

}

@parser::namespace { CSharpClassGrammar.Generated }
@lexer::namespace  { CSharpClassGrammar.Generated }

@header
{
        using System;
        using System.Collections.Generic;

}

@members
{
        private List<string> _classCollector = new List<string>();
        public List<string> ClassCollector { get { return
_classCollector; } }

}

/*------------------------------------------------------------------
 * PARSER RULES
 *------------------------------------------------------------------*/

csfile  : class_declaration* EOF
        ;

class_declaration
        : (ACCESSLEVEL | MODIFIERS)* PARTIAL? 'class' CLASSNAME
          class_body
          ';'?
          { _classCollector.Add($CLASSNAME.text); }
        ;

class_body
        : '{' class_declaration* '}'
        ;

/*------------------------------------------------------------------
 * LEXER RULES
 *------------------------------------------------------------------*/

ACCESSLEVEL
        : 'public' | 'internal' | 'protected' | 'private' | 'protected
internal'
        ;

MODIFIERS
        : 'static' | 'sealed' | 'abstract'
        ;

PARTIAL
        : 'partial'
        ;

CLASSNAME
        : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
        ;

COMMENT
        : '//' ~('\n'|'\r')* {$channel=HIDDEN;}
        |   '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
        ;

WHITESPACE
        : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ { $channel = HIDDEN; }
        ; 

This parser correctly count empty classes (and nested classes too) with empty class-body:

internal class DeclarationClass1
{
    class DeclarationClass2
    {
        public class DeclarationClass3
        {
            abstract class DeclarationClass4
            {
            }
        }
    }
}

I need to count classes with not empty body, such as:

class TestClass
{
    int a = 42;

    class Nested { }
}

I need to somehow ignore all the code that is "not a class declaration". In the example above ignore

int a = 42;

How can I do this? May be example for other language?
Please, help!


回答1:


When you're only interested in certain parts of a source file, you could set filter=true in your options { ... } sections. This will enable you to only define those tokens you're interested in, and what you don't define, is ignored by the lexer.

Note that this only works with lexer grammars, not in combined (or parser) grammars.

A little demo:

lexer grammar CSharpClassLexer;

options {
  language=CSharp2;
  filter=true;
}

@namespace { Demo }

Comment
  :  '//' ~('\r' | '\n')*
  |  '/*' .* '*/'
  ;

String
  :  '"' ('\\' . | ~('"' | '\\' | '\r' | '\n'))* '"'
  |  '@' '"' ('"' '"' | ~'"')* '"'
  ;

Class
  :  'class' Space+ Identifier 
     {Console.WriteLine("Found class: " + $Identifier.text);}
  ;

Space
  :  ' ' | '\t' | '\r' | '\n'
  ;

Identifier
  :  ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')*
  ;

It's important you leave the Identifier in there because you don't want Xclass Foo to be tokenized as: ['X', 'class', 'Foo']. With the Identifier in there, Xclass will become the entire identifier.

The grammar can be tested with the following class:

using System;
using Antlr.Runtime;

namespace Demo
{
    class MainClass
    {
        public static void Main (string[] args)
        {
            string source = 
@"class TestClass
{
    int a = 42;

    string _class = ""inside a string literal: class FooBar {}..."";

    class Nested { 
        /* class NotAClass {} */

        // class X { }

        class DoubleNested {
            string str = @""
                multi line string 
                class Bar {}
            "";
        }
    }
}";
            Console.WriteLine("source=\n" + source + "\n-------------------------");
            ANTLRStringStream Input = new ANTLRStringStream(source);
            CSharpClassLexer Lexer = new CSharpClassLexer(Input);
            CommonTokenStream Tokens = new CommonTokenStream(Lexer);
            Tokens.GetTokens();
        }
    }
}

which produces the following output:

source=
class TestClass
{
    int a = 42;

    string _class = "inside a string literal: class FooBar {}...";

    class Nested { 
        /* class NotAClass {} */

        // class X { }

        class DoubleNested {
            string str = @"
                multi line string 
                class Bar {}
            ";
        }
    }
}
-------------------------
Found class: TestClass
Found class: Nested
Found class: DoubleNested

Note that this is just a quick demo, I am not sure if I handled the proper string literals in the grammar (I am unfamiliar with C#), but this demo should give you a start.

Good luck!



来源:https://stackoverflow.com/questions/4914073/partial-grammar-for-counting-class-count

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!