问题
I built a module to count the lines of code (LOC) of a Java project. For this purpose I had to ignore :
- blank lines
- single line comments
- and multiline comments (/*......*/).
I achieved the first two using the list comprehension on the file lines with regexes and I solved also the third point visiting the whole file strings with the proper pattern matching and substitution. I was wondering, is there a better and/or more performant way to reach the same goal?
PS: I opted for substitution, even if it is heavier than counting and subtraction, due to the fact that multiline comments can be intertwined with actual code in the same line. An example of tricky multiline comments can be:
String test2 = "abc /* fake comment*/";
String cde = "this is a test";//an inline comment
String efg = "ciccio"; /*this is a
weird comment*/ String hil = "pluto";
回答1:
Yes, you can try different ways.
- My first choice would be to write a grammar for files with comments, using stuff like
lexical SingleLineComment = "//" ~[\n] "\n";
andlexical OtherStuff = ![\\]+ !>> ![\\]
. The parse tree that comes out can be visited to count the size of all the comments and you could subtract that from the total amount. - Use an existing Java grammar to parse the files, from the library in
lang::java
, and similarly analyze the parse tree - Use an existing external parser (like the JDT) and find the start-lines of all AST nodes. Lines with starting AST nodes are not empty, the others are. So subtraction is your friend again.
- You can anchor your regexes better so they become less non-deterministic. (I.e. with
^
and$
, so that thevisit
you wrote becomes faster.
It's advisable to enable the Rascal CPU profiler on the REPL: :set profiling true
and look where the actual bottleneck AST node is in the profile which is printed after running a test.
来源:https://stackoverflow.com/questions/59179175/what-is-the-best-way-to-ignore-comments-in-a-java-file-with-rascal