Take an example.
public static FieldsConfig getFieldsConfig(){
if(xxx) {
sssss;
}
return;
}
I write a regex, \"\\\\
Victor, you've asked me to refer your answer. So I decided to take a time to write full review of it and give some hints. I'm not some kind of regex professional nor like it very much. Currently I'm working on a project that uses regex heavily so I've seen and wrote enaugh of it to answer your question pretty reliably as well as get sick of regexes. So let's start your regex analysis:
String regex ="\\s*public\\s*static.*getFieldsConfig\\(.*?\\)\\s*\\{.*\\}(?=\\s*(public|private|protected|static))";
String regex2 = "\\s*public\\s*static.*getFieldsConfig\\(.*?\\)\\s*\\{.*\\}(?=(\\s*}\\s*$))";
regex = "(" + regex +")|("+ regex2 + "){1}?";
I see you've made it of three parts for readability. That's a good idea. I'll start from first part :
\\s\*public\\s\*static.*getFieldsConfig
You allow any number, including zero whitespaces between public
and static
. It could match publicstatic. Everytime use \\s+
between words that must be separated with some number of whitespaces.(.\*?\\)\\s\*\\{.\*\\}
You allow anything to appear between first parantheses. It would match any symbol until )
. Now we reached the part that makes your regex work not as you've wanted. \\{.*\\}
is a major mistake. It will match everything until last }
before last in file any of public
private
protected
static
is reached. I've pasted your getFieldsConfig
method to java file and tested it. Using only first part of your regex ("\\s*public\\s*static.*getFieldsConfig\\(.*?\\)\\s*\\{.*\\}(?=\\s*(public|private|protected|static))"
) mached everything from your method until last method in file.There is no point to analyze step by step other parts, because \\{.*\\}
ruins everything. In second part (regex2
) you've mached anything from your method to last }
in file. Have you tried to print what your regex is matching? Try it:
package com.tryRegex;
import java.io.File;
import java.io.IOException;
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class TryRegex{
public static void main(String[] args) throws IOException{
File yourFile = new File("tryFile.java");
Scanner scanner = new Scanner(yourFile, "UTF-8");
String text = scanner.useDelimiter("\\A").next(); // `\\A` marks beginning of file. Since file has only one beginning, it will scan file from start to beginning.
String regex ="\\s*public\\s*static.*getFieldsConfig\\(.*?\\)\\s*\\{.*\\}(?=\\s*(public|private|protected|static))";
String regex2 = "\\s*public\\s*static.*getFieldsConfig\\(.*?\\)\\s*\\{.*\\}(?=(\\s*}\\s*$))";
regex = "(?s)(" + regex +")|("+ regex2 + "){1}?"; // I've included (?s) since we reading from file newline chars are not excluded. Without (?s) it would match anything unless your method is written in a single line.
Matcher m = Pattern.compile(regex).matcher(text);
System.out.println(m.find() ? m.group() : "No Match found");
}
}
Short and simple piece of code to show how your regex works. Handle exception if you want. Just put yourFile.java
to your project folder and run it.
Now I will show you how messy regexes actually is:
String methodSignature = "(\\s*((public|private|protected|static|final|abstract|synchronized|volatile)\\s+)*[\\w<>\\[\\]\\.]+\\s+\\w+\\s*\\((\\s*[\\w<>\\[\\]\\.]*\\s+\\w+\\s*,?)*\\s*\\))";
String regex = "(?s)" + methodSignature + ".*?(?="+ methodSignature + ")";
Basically this regex matches every method. But it also has flaws. I will explore it as well as it's flaws.
\\s*((public|private|protected|static|final|abstract|synchronized|volatile)\\s+)*
Matches any of specified modifiers (and at least one whitespace) any times including zero, since method could have no modifier. (I've left number of modifiers allowed unlimited for the sake of simplicity. In real parser I wouldn't allow this as well as wouldn't use regex for such task.)[\\w<>\\[\\]\\.]+
This is the method's return type. It can contain word characters, <>
for generic types, []
for arrays, and .
for nested class notation.\\s+\\w+\\s*\\
Name of the method.\\((\\s*[\\w<>\\[\\]\\.]*\\s+\\w+\\s*,?)*\\s*\\))
Especially tricky part - method paramethers. At first you can think that this part could be easily replaced with (
. I thought this too. But then I've noticed that it matches not only methods, but anonymous classes too such as new Anonymous(someVariable){....}
Simplest and most efficient way to avoid this is by specifying method parameters structure. [\\w<>\\[\\]\\.]
is possible symbols that parameter type could be made of. \\s+\\w+\\s*,?
Parameter type is followed by at least one whitespace and parameter name. Parameter name may be followed by ,
if method contains more than one parameter.So what's about flaws? Major flaw is classes that is defined in methods. Method can contain class definitions in it. Consider this situation:
public void regexIsAGoodThing(){
//some code
new RegexIsNotSoGoodActually(){
void dissapontingMethod(){
//Efforts put in writing this regex was pointless because of this dissapointing method.
}
}
}
This explains very well why regex is not a proper tool for such job. It is not possible to parse method from java file reliably because method may be nested structure. Method may contain class definitions and these classes can contain methods that has another class definitions and so on. Regex is caught by infinite recursion and fails.
Another case were regex would fail is comments. In comments you can type anything.
void happyRegexing(){
return void;
// public void happyRegexingIsOver(){....}
}
One more thing that we cannot forget is annotations. What if next method is annotated? That regex will match almost fine, except that it will match annotation too. This can be avoided but then regex will be even larger.
public void goodDay(){
}
@Zzzzz //This annotation can be carried out by making our regex even more larger
public void goodNight(){
}
Another one case would be blocks. What if between two methods will be either static or instance block included?
public void iWillNotDoThisAnyMore(){
}
static{
//some code
}
public void iWillNotParseCodeWithRegex(){
//end of story
}
P.S It has another flaw - it matches new SomeClass()
and everything until next method signature. You can work around this, but again - this would be work around but not an elegant code. And I haven't included end of file matching. Maybe I will add edit tomorrow if your'e interested. Going to sleep now, it's close to morning in Europe.
As you can see, regex is almost good tool for most of tasks. But we, programmers, hate word almost. We do not even have it in our vocabularies. Aren't we?
I had to modify this answer for my own needs. I wanted capture groups for the entire method as well as the names of each method in the file. I only need these two capture groups. This requires the single line (s) flag in PCRE. The global (g) flag would be needed to in other REGEX parses to capture the full file and not just one match. I nested the bracket capture @SamWhan showed to allow five levels of nesting. This should get the job done as more is against most recommended standards. This makes this REGEX really expensive so be warned.
(?:public|private|protected|static|final|abstract|synchronized|volatile)\s*(?:(?:(?:\w*\s)?(\w+))|)\(.*?\)\s*(?:\{(?:\{[^{}]*(?:\{[^{}]*(?:\{[^{}]*(?:\{[^{}]*(?:\{[^{}]*(?:\{[^{}]*}|.)*?[^{}]*}|.)*?[^{}]*}|.)*?[^{}]*}|.)*?[^{}]*}|.)*?[^{}]*}|.)*?})
I decided to take it one step further ;)
Here's a regex that'll give you the modifiers, type, name and body of a function in different capture groups:
((?:(?:public|private|protected|static|final|abstract|synchronized|volatile)\s+)*)
\s*(\w+)\s*(\w+)\(.*?\)\s*({(?:{[^{}]*}|.)*?})
It handles nested braces (@callOfCode it is (semi-)possible with regex ;) and a fixed set of modifiers.
It doesn't handle complicated stuff like braces inside comments and stuff like that, but it'll work for the simplest ones.
Regards
Regex101 sample here
Edit: And to answer your question ;), what you're interested in is capture group 4.
Edit 2: As I said - simple ones. But you could make it more complicated to handle more complicated methods. Here's an updated handling one more level of nesting.
((?:(?:public|private|protected|static|final|abstract|synchronized|volatile)\s+)*)
\s*(\w+)\s*(\w+)\(.*?\)\s*({(?:{[^{}]*(?:{[^{}]*}|.)*?[^{}]*}|.)*?})
And you could another level... and another... But as someone commented - this shouldn't be done by regex. This however handles simple methods.
Regex is definitely not the best tool for that, but if you want regex, and your code is well indented, you can try with:
^(?<indent>\s*)(?<mod1>\w+)\s(?<mod2>\w+)?\s*(?<mod3>\w+)?\s*(?<return>\b\w+)\s(?<name>\w+)\((?<arg>.*?)\)\s*\{(?<body>.+?)^\k<indent>\}
DEMO
It has additional named groups, you can delete them. It use a indentation level to find last }
.
You need to enable DOTALL mode. Then dot will match newLine chars. Just include (?s)
in the beginning of your regex.
String s = " public static FieldsConfig getFieldsConfig(){\n"
+ " if(xxx) {\n"
+ " sssss;\n"
+ " }\n"
+ " return;\n"
+"}";
Matcher m = Pattern.compile("(?s)\\s*public\\s+static\\s+\\w+?\\sgetFieldsConfig\\(\\s*\\).*").matcher(s);
m.find();
System.out.println(m.group());
Outpup is all method body as you wanted. Without (?s)
it matches only the first line. But you cannot parse java code with regex. Others already said that. This regex will match everything from beginning of method signature to the end of file. How do you match it only until the end of method body is reached? Method can contain many {....}
as well as many return;
. Regex is not a magic stick.
Try this
((?<space>\h+)public\s+static\s+[^(]+\([^)]*?\)\s*\{.*?\k<space>\})|(public\s+static\s+[^(]+\([^)]*?\)\s*\{.*?\n\})
Explanation:
We will capture method block start by keyword public
end to }
, public
and }
must have the same \s
character so your code must be well format : ) https://en.wikipedia.org/wiki/Indent_style
\h
: match whitespace but not newlines
(?<space>\h+)
: Get all whitespace before public
then group in space
name
public\s+static\s
public static
[^(]
: any character but not (
([^)]
: any but not )
\k<space>\}
: }
same number of whitespace
then }
at the end.
Demo
Input:
public static FieldsConfig getFieldsConfig(){
if(xxx) {
sssss;
}
return;
}
NO CAPTURE
public static FieldsConfig getFieldsConfig2(){
if(xxx) {
sssss;
}
return;
}
NO CAPTURE
public static FieldsConfig getFieldsConfig3(){
if(xxx) {
sssss;
}
return;
}
NO CAPTURE
public static FieldsConfig getFieldsConfig4(){
if(xxx) {
sssss;
}
return;
}
Output:
MATCH 1
3. [0-91] `public static FieldsConfig getFieldsConfig(){
if(xxx) {
sssss;
}
return;
}`
MATCH 2
3. [105-197] `public static FieldsConfig getFieldsConfig2(){
if(xxx) {
sssss;
}
return;
}`
MATCH 3
1. [211-309] ` public static FieldsConfig getFieldsConfig3(){
if(xxx) {
sssss;
}
return;
}`
MATCH 4
1. [324-428] ` public static FieldsConfig getFieldsConfig4(){
if(xxx) {
sssss;
}
return;
}`