Regex that Will Match a Java Method Declaration

后端 未结 14 1175
半阙折子戏
半阙折子戏 2020-11-27 17:53

I need a Regex that will match a java method declaration. I have come up with one that will match a method declaration, but it requires the opening bracket of the method to

相关标签:
14条回答
  • 2020-11-27 18:41

    After looking through the other answers, here is what I came up with:

    #permission
       ^[ \t]*(?:(?:public|protected|private)\s+)?
    #keywords
       (?:(static|final|native|synchronized|abstract|threadsafe|transient|{#insert zJRgx123GenericsNotInGroup})\s+){0,}
    #return type
       #If return type is "return" then it's actually a 'return funcName();' line. Ignore.
       (?!return)
       \b([\w.]+)\b(?:|{#insert zJRgx123GenericsNotInGroup})((?:\[\]){0,})\s+
    #function name
       \b\w+\b\s*
    #parameters
       \(
          #one
             \s*(?:\b([\w.]+)\b(?:|{#insert zJRgx123GenericsNotInGroup})((?:\[\]){0,})(\.\.\.)?\s+(\w+)\b(?![>\[])
          #two and up
             \(\s*(?:,\s+\b([\w.]+)\b(?:|{#insert zJRgx123GenericsNotInGroup})((?:\[\]){0,})(\.\.\.)?\s+(\w+)\b(?![>\[])\s*){0,})?\s*
       \)
    #post parameters
       (?:\s*throws [\w.]+(\s*,\s*[\w.]+))?
    #close-curly (concrete) or semi-colon (abstract)
       \s*(?:\{|;)[ \t]*$
    

    Where {#insert zJRgx123GenericsNotInGroup} equals

    `(?:<[?\w\[\] ,.&]+>)|(?:<[^<]*<[?\w\[\] ,.&]+>[^>]*>)|(?:<[^<]*<[^<]*<[?\w\[\] ,.&]+>[^>]*>[^>]*>)`
    

    Limitations:

    • ANY parameter can have an ellipsis: "..." (Java allows only last)
    • Three levels of nested generics at most: (<...<...<...>...>...> okay, <...<...<...<...>...>...>...> bad). The syntax inside generics can be very bogus, and still seem okay to this regex.
    • Requires no spaces between types and their (optional) opening generics '<'
    • Recognizes inner classes, but doesn't prevent two dots next to each other, such as Class....InnerClass

    Below is the raw PhraseExpress code (auto-text and description on line 1, body on line 2). Call {#insert zJRgxJavaFuncSigThrSemicOrOpnCrly}, and you get this:

    ^[ \t]*(?:(?:public|protected|private)\s+)?(?:(static|final|native|synchronized|abstract|threadsafe|transient|(?:<[?\w\[\] ,&]+>)|(?:<[^<]*<[?\w\[\] ,&]+>[^>]*>)|(?:<[^<]*<[^<]*<[?\w\[\] ,&]+>[^>]*>[^>]*>))\s+){0,}(?!return)\b([\w.]+)\b(?:|(?:<[?\w\[\] ,&]+>)|(?:<[^<]*<[?\w\[\] ,&]+>[^>]*>)|(?:<[^<]*<[^<]*<[?\w\[\] ,&]+>[^>]*>[^>]*>))((?:\[\]){0,})\s+\b\w+\b\s*\(\s*(?:\b([\w.]+)\b(?:|(?:<[?\w\[\] ,&]+>)|(?:<[^<]*<[?\w\[\] ,&]+>[^>]*>)|(?:<[^<]*<[^<]*<[?\w\[\] ,&]+>[^>]*>[^>]*>))((?:\[\]){0,})(\.\.\.)?\s+(\w+)\b(?![>\[])\s*(?:,\s+\b([\w.]+)\b(?:|(?:<[?\w\[\] ,&]+>)|(?:<[^<]*<[?\w\[\] ,&]+>[^>]*>)|(?:<[^<]*<[^<]*<[?\w\[\] ,&]+>[^>]*>[^>]*>))((?:\[\]){0,})(\.\.\.)?\s+(\w+)\b(?![>\[])\s*){0,})?\s*\)(?:\s*throws [\w.]+(\s*,\s*[\w.]+))?\s*(?:\{|;)[ \t]*$
    

    Raw code:

    zJRgx123GenericsNotInGroup -- To precede return-type    (?:<[?\w\[\] ,.&]+>)|(?:<[^<]*<[?\w\[\] ,.&]+>[^>]*>)|(?:<[^<]*<[^<]*<[?\w\[\] ,.&]+>[^>]*>[^>]*>)  zJRgx123GenericsNotInGroup
    zJRgx0OrMoreParams  \s*(?:{#insert zJRgxParamTypeName}\s*(?:,\s+{#insert zJRgxParamTypeName}\s*){0,})?\s*   zJRgx0OrMoreParams
    zJRgxJavaFuncNmThrClsPrn_M_fnm -- Needs zvFOBJ_NAME (?<=\s)\b{#insert zvFOBJ_NAME}{#insert zzJRgxPostFuncNmThrClsPrn}   zJRgxJavaFuncNmThrClsPrn_M_fnm
    zJRgxJavaFuncSigThrSemicOrOpnCrly -(**)-    {#insert zzJRgxJavaFuncSigPreFuncName}\w+{#insert zzJRgxJavaFuncSigPostFuncName}    zJRgxJavaFuncSigThrSemicOrOpnCrly
    zJRgxJavaFuncSigThrSemicOrOpnCrly_M_fnm -- Needs zvFOBJ_NAME    {#insert zzJRgxJavaFuncSigPreFuncName}{#insert zvFOBJ_NAME}{#insert zzJRgxJavaFuncSigPostFuncName}  zJRgxJavaFuncSigThrSemicOrOpnCrly_M_fnm
    zJRgxOptKeywordsBtwScopeAndRetType  (?:(static|final|native|synchronized|abstract|threadsafe|transient|{#insert zJRgx123GenericsNotInGroup})\s+){0,}    zJRgxOptKeywordsBtwScopeAndRetType
    zJRgxOptionalPubProtPriv    (?:(?:public|protected|private)\s+)?    zJRgxOptionalPubProtPriv
    zJRgxParamTypeName -(**)- Ends w/ '\b(?![>\[])' to NOT find <? 'extends XClass'> or ...[]>  (*Original: zJRgxParamTypeName, Needed by: zJRgxParamTypeName[4FQPTV,ForDel[NmsOnly,Types]]*){#insert zJRgxTypeW0123GenericsArry}(\.\.\.)?\s+(\w+)\b(?![>\[])   zJRgxParamTypeName
    zJRgxTypeW0123GenericsArry -- Grp1=Type, Grp2='[]', if any  \b([\w.]+)\b(?:|{#insert zJRgx123GenericsNotInGroup})((?:\[\]){0,}) zJRgxTypeW0123GenericsArry
    zvTTL_PRMS_stL1c    {#insert zCutL1c}{#SETPHRASE -description zvTTL_PRMS -content {#INSERTCLIPBOARD} -autotext zvTTL_PRMS -folder ctvv_folder}  zvTTL_PRMS_stL1c
    zvTTL_PRMS_stL1cSvRstrCB    {#insert zvCB_CONTENTS_stCB}{#insert zvTTL_PRMS_stL1c}{#insert zSetCBToCB_CONTENTS} zvTTL_PRMS_stL1cSvRstrCB
    zvTTL_PRMS_stPrompt {#SETPHRASE -description zvTTL_PRMS -content {#INPUT -head How many parameters? -single} -autotext zvTTL_PRMS -folder ctvv_folder}  zvTTL_PRMS_stPrompt
    zzJRgxJavaFuncNmThrClsPrn_M_fnmTtlp -- Needs zvFOBJ_NAME, zvTTL_PRMS    (?<=[ \t])\b{#insert zvFOBJ_NAME}\b\s*\(\s*{#insert {#COND -if {#insert zvTTL_PRMS} = 0 -then z1slp -else zzParamsGT0_M_ttlp}}\)    zzJRgxJavaFuncNmThrClsPrn_M_fnmTtlp
    zzJRgxJavaFuncSigPostFuncName   {#insert zzJRgxPostFuncNmThrClsPrn}(?:\s*throws \b(?:[\w.]+)\b(\s*,\s*\b(?:[\w.]+)\b))?\s*(?:\{|;)[ \t]*$   zzJRgxJavaFuncSigPostFuncName
    zzJRgxJavaFuncSigPreFuncName    (*If a type has generics, there may be no spaces between it and the first open '<', also requires generics with three nestings at the most (<...<...<...>...>...> okay, <...<...<...<...>...>...>...> not)*)^[ \t]*{#insert zJRgxOptionalPubProtPriv}{#insert zJRgxOptKeywordsBtwScopeAndRetType}(*To prevent 'return funcName();' from being recognized:*)(?!return){#insert zJRgxTypeW0123GenericsArry}\s+\b  zzJRgxJavaFuncSigPreFuncName
    zzJRgxPostFuncNmThrClsPrn   \b\s*\({#insert zJRgx0OrMoreParams}\)   zzJRgxPostFuncNmThrClsPrn
    zzParamsGT0_M_ttlp -- Needs zvTTL_PRMS  {#insert zJRgxParamTypeName}\s*{#insert {#COND -if {#insert zvTTL_PRMS} = 1 -then z1slp -else zzParamsGT1_M_ttlp}}  zzParamsGT0_M_ttlp
    zzParamsGT1_M_ttlp  {#LOOP ,\s+{#insert zJRgxParamTypeName}\s* -count {#CALC {#insert zvTTL_PRMS} - 1 -round 0 -thousands none}}    zzParamsGT1_M_ttlp
    
    0 讨论(0)
  • 2020-11-27 18:42

    I came up with this:

    \b\w*\s*\w*\(.*?\)\s*\{[\x21-\x7E\s]*\}
    

    I tested it against a PHP function but it should work just the same, this is the snippet of code I used:

    function getProfilePic($url)
     {
        if(@open_image($url) !== FALSE)
         {
            @imagepng($image, 'images/profiles/' . $_SESSION['id'] . '.png');
            @imagedestroy($image);
            return TRUE;
         }
        else 
         {
            return FALSE;
         }
     }
    

    MORE INFO:

    Options: case insensitive
    
    Assert position at a word boundary «\b»
    Match a single character that is a “word character” (letters, digits, etc.) «\w*»
       Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
    Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.) «\s*»
       Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
    Match a single character that is a “word character” (letters, digits, etc.) «\w*»
       Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
    Match the character “(” literally «\(»
    Match any single character that is not a line break character «.*?»
       Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
    Match the character “)” literally «\)»
    Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.) «\s*»
       Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
    Match the character “{” literally «\{»
    Match a single character present in the list below «[\x21-\x7E\s]*»
       Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
       A character in the range between ASCII character 0x21 (33 decimal) and ASCII character 0x7E (126 decimal) «\x21-\x7E»
       A whitespace character (spaces, tabs, line breaks, etc.) «\s»
    Match the character “}” literally «\}»
    
    
    Created with RegexBuddy
    
    0 讨论(0)
  • 2020-11-27 18:45

    I'm pretty sure Java's regex engine is greedy by default, meaning that "\w+ +\w+ *\(.*\) *\{" will never match since the .* within the parenthesis will eat everything after the opening paren. I recommend you replace the .* with [^)], this way you it will select all non-closing-paren characters.

    NOTE: Mike Stone corrected me in the comments, and since most people don't really open the comments (I know I frequently don't notice them):

    Greedy doesn't mean it will never match... but it will eat parens if there are more parens after to satisfy the rest of the regex... so for example "public void foo(int arg) { if (test) { System.exit(0); } }" will not match properly...

    0 讨论(0)
  • 2020-11-27 18:46

    As of git 2.19.0, the built-in regexp for Java now seems to work well, so supplying your own may not be necessary.

    "!^[ \t]*(catch|do|for|if|instanceof|new|return|switch|throw|while)\n"
    "^[ \t]*(([A-Za-z_][A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]*[ \t]*\\([^;]*)$"
    

    (The first line seems to be for filtering out lines that resemble method declarations but aren't.)

    0 讨论(0)
  • 2020-11-27 18:47

    Have you considered matching the actual possible keywords? such as:

    (?:(?:public)|(?:private)|(?:static)|(?:protected)\s+)*
    

    It might be a bit more likely to match correctly, though it might also make the regex harder to read...

    0 讨论(0)
  • 2020-11-27 18:48

    This will pick the name of method not the whole line.

    (?<=public static void )\w+|(?<=private static void )\w+|(?<=protected static void )\w+|(?<=public void )\w+|(?<=private void )\w+|(?<=protected void )\w+|(?<=public final void)\w+|(?<=private final void)\w+|(?<=protected final void)\w+|(?<=private void )\w+|(?<=protected void )\w+|(?<=public static final void )\w+|(?<=private static final void )\w+|(?<=public final static void )\w+|(?<=protected final static void )\\w+|(?<=private final static void )\w+|(?<=protected final static void )\w+|(?<=void )\w+|(?<=private static )\w+
    
    0 讨论(0)
提交回复
热议问题