Finding VBA Comments using RegEx

前端 未结 3 1434
予麋鹿
予麋鹿 2021-01-23 22:49

I am trying to find all VBA comments using regular expressions. I have something that mostly works, but there are a few exceptions that I cannot figure out.

Expression I

相关标签:
3条回答
  • 2021-01-23 23:33

    Maybe something like

    ^(?:[^"'\n]*("(?:[^"\n]|"")*"))*[^"]*'(.*)$
    

    It handles multiple quoted strings, as well as strings having quoted (double) "'s (which I believe is VBA's way).

    (I guarantee it will fail in some cases, but probably will work in most ;)

    Check it out here at regex101.

    Edit

    Added some of Comintern's examples and adjusted the regex. It still can't handle the bracketed identifiers though (which I don't even know what it means :S See the last line). But it now handles his continued line comments.

    ^(?:[^"'\n]*(?:"(?:[^"\n]|"")*"))*[^']*('(?:_\n|.)*)
    

    Check it out here at regex101.

    0 讨论(0)
  • 2021-01-23 23:35

    This should work:

    ("[^"]+"\s)?'.+
    

    Tested here: https://regex101.com/r/dd60QS/1

    0 讨论(0)
  • 2021-01-23 23:45

    You can't find all of the comments (let alone string literals) in VBA code with regular expressions - period. Trust me, I tried during work on the Smart Indenter module of Rubberduck (in case that wasn't explicit enough - full disclosure, I'm a contributor). You'll need to actually parse the code. The first issue that you'll run into are line continuations:

    'Comment with a line _
    continuation
    
    Debug.Print 'End of line comment _
    with line continuation.
    
    Debug.Print 'Multiple line continuation operators _ _
    still work.
    
    Debug.Print 'This is actually *not* a line continuation_
    Debug.Print 42
    

    This makes it difficult to identify string literals, especially you're using line-by-line processing:

    Debug.Print 42 'The next line... _
    "...is not a string literal"
    

    You also have to handle the old Rem comment syntax...

    Rem old school comment
    

    ...which also support line continuations:

    Rem old school comment with line _
    continuation.
    

    You might be thinking "that can't be so bad, Rem has to start a line". If you are, you forgot about the statement separator (:)...

    Debug.Print 42: Rem statement separator comment.
    

    ...or its evil twin the statement separator combined with a line continuation:

    Debug.Print 42: Rem this can be _
    continued too.
    

    You covered a couple of the issues with sorting out string literals and comments like these...

    Debug.Print "Unmatched double quotes." 'Comment"
    Debug.Print "Interleaved single 'n double quotes." 'Comment"
    

    ...but what about bracketed identifiers like this beast (courtesy of @ThunderFrame)?

    'No comments or strings in the line below.
    Debug.Print [Evil:""Comment"'here] 
    

    Note that the syntax highlighter SO uses doesn't even catch all of these bizarre corner cases.

    0 讨论(0)
提交回复
热议问题