问题
I need to count the number of classes in correct C# source file. I wrote the following grammar:
grammar CSharpClassGrammar;
options
{
language=CSharp2;
}
@parser::namespace { CSharpClassGrammar.Generated }
@lexer::namespace { CSharpClassGrammar.Generated }
@header
{
using System;
using System.Collections.Generic;
}
@members
{
private List<string> _classCollector = new List<string>();
public List<string> ClassCollector { get { return
_classCollector; } }
}
/*------------------------------------------------------------------
* PARSER RULES
*------------------------------------------------------------------*/
csfile : class_declaration* EOF
;
class_declaration
: (ACCESSLEVEL | MODIFIERS)* PARTIAL? 'class' CLASSNAME
class_body
';'?
{ _classCollector.Add($CLASSNAME.text); }
;
class_body
: '{' class_declaration* '}'
;
/*------------------------------------------------------------------
* LEXER RULES
*------------------------------------------------------------------*/
ACCESSLEVEL
: 'public' | 'internal' | 'protected' | 'private' | 'protected
internal'
;
MODIFIERS
: 'static' | 'sealed' | 'abstract'
;
PARTIAL
: 'partial'
;
CLASSNAME
: ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
;
COMMENT
: '//' ~('\n'|'\r')* {$channel=HIDDEN;}
| '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
;
WHITESPACE
: ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ { $channel = HIDDEN; }
;
This parser correctly count empty classes (and nested classes too) with empty class-body:
internal class DeclarationClass1
{
class DeclarationClass2
{
public class DeclarationClass3
{
abstract class DeclarationClass4
{
}
}
}
}
I need to count classes with not empty body, such as:
class TestClass
{
int a = 42;
class Nested { }
}
I need to somehow ignore all the code that is "not a class declaration". In the example above ignore
int a = 42;
How can I do this? May be example for other language?
Please, help!
回答1:
When you're only interested in certain parts of a source file, you could set filter=true
in your options { ... } sections. This will enable you to only define those tokens you're interested in, and what you don't define, is ignored by the lexer.
Note that this only works with lexer grammars, not in combined (or parser) grammars.
A little demo:
lexer grammar CSharpClassLexer;
options {
language=CSharp2;
filter=true;
}
@namespace { Demo }
Comment
: '//' ~('\r' | '\n')*
| '/*' .* '*/'
;
String
: '"' ('\\' . | ~('"' | '\\' | '\r' | '\n'))* '"'
| '@' '"' ('"' '"' | ~'"')* '"'
;
Class
: 'class' Space+ Identifier
{Console.WriteLine("Found class: " + $Identifier.text);}
;
Space
: ' ' | '\t' | '\r' | '\n'
;
Identifier
: ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')*
;
It's important you leave the Identifier
in there because you don't want Xclass Foo
to be tokenized as: ['X', 'class', 'Foo']
. With the Identifier
in there, Xclass
will become the entire identifier.
The grammar can be tested with the following class:
using System;
using Antlr.Runtime;
namespace Demo
{
class MainClass
{
public static void Main (string[] args)
{
string source =
@"class TestClass
{
int a = 42;
string _class = ""inside a string literal: class FooBar {}..."";
class Nested {
/* class NotAClass {} */
// class X { }
class DoubleNested {
string str = @""
multi line string
class Bar {}
"";
}
}
}";
Console.WriteLine("source=\n" + source + "\n-------------------------");
ANTLRStringStream Input = new ANTLRStringStream(source);
CSharpClassLexer Lexer = new CSharpClassLexer(Input);
CommonTokenStream Tokens = new CommonTokenStream(Lexer);
Tokens.GetTokens();
}
}
}
which produces the following output:
source=
class TestClass
{
int a = 42;
string _class = "inside a string literal: class FooBar {}...";
class Nested {
/* class NotAClass {} */
// class X { }
class DoubleNested {
string str = @"
multi line string
class Bar {}
";
}
}
}
-------------------------
Found class: TestClass
Found class: Nested
Found class: DoubleNested
Note that this is just a quick demo, I am not sure if I handled the proper string literals in the grammar (I am unfamiliar with C#), but this demo should give you a start.
Good luck!
来源:https://stackoverflow.com/questions/4914073/partial-grammar-for-counting-class-count