What is the best way to match fully qualified Java class name in a text?
Examples: java.lang.Reflect
, java.util.ArrayList
, org.hiber
I'll say something like ([\w]+\.)*[\w]+
But maybe I can be more specific knowing what you want to do with it ;)
shorter version of a working regexp:
\p{Alnum}[\p{Alnum}._]+\p{Alnum}
For string like com.mycompany.core.functions.CustomFunction I'm using ((?:(?:\w+)?\.[a-z_A-Z]\w+)+)
The pattern provided by Renaud works, but his original answer will always backtrack at the end.
To optimize it, you can essentially swap the first half with the last. Note the dot match that you also need to change.
The following is my version of it that, when compared to the original, runs about twice as fast:
String ID_PATTERN = "\\p{javaJavaIdentifierStart}\\p{javaJavaIdentifierPart}*";
Pattern FQCN = Pattern.compile(ID_PATTERN + "(\\." + ID_PATTERN + ")*");
I cannot write comments, so I decided to write an answer instead.
Here is a fully working class with tests, based on the excellent comment from @alan-moore
import static org.junit.Assert.assertFalse;
import static org.junit.Assert.assertTrue;
import java.util.regex.Pattern;
import org.junit.Test;
public class ValidateJavaIdentifier {
private static final String ID_PATTERN = "\\p{javaJavaIdentifierStart}\\p{javaJavaIdentifierPart}*";
private static final Pattern FQCN = Pattern.compile(ID_PATTERN + "(\\." + ID_PATTERN + ")*");
public static boolean validateJavaIdentifier(String identifier) {
return FQCN.matcher(identifier).matches();
}
@Test
public void testJavaIdentifier() throws Exception {
assertTrue(validateJavaIdentifier("C"));
assertTrue(validateJavaIdentifier("Cc"));
assertTrue(validateJavaIdentifier("b.C"));
assertTrue(validateJavaIdentifier("b.Cc"));
assertTrue(validateJavaIdentifier("aAa.b.Cc"));
assertTrue(validateJavaIdentifier("a.b.Cc"));
// after the initial character identifiers may use any combination of
// letters and digits, underscores or dollar signs
assertTrue(validateJavaIdentifier("a.b.C_c"));
assertTrue(validateJavaIdentifier("a.b.C$c"));
assertTrue(validateJavaIdentifier("a.b.C9"));
assertFalse("cannot start with a dot", validateJavaIdentifier(".C"));
assertFalse("cannot have two dots following each other",
validateJavaIdentifier("b..C"));
assertFalse("cannot start with a number ",
validateJavaIdentifier("b.9C"));
}
}
I came (on my own) to a similar answer (as Tomalak's answer), something as M.M.M.N:
([a-z][a-z_0-9]*\.)*[A-Z_]($[A-Z_]|[\w_])*
Where,
M = ([a-z][a-z_0-9]*\.)*
N = [A-Z_]($[A-Z_]|[\w_])*
However, this regular expression (unlike Tomalak's answer) makes more assumptions:
The package name (The M part) will be only in lower case, the first character of M will be always a lower letter, the rest can mix underscore, lower letters and numbers.
The Class Name (the N part) will always start with an Upper Case Letter or an underscore, the rest can mix underscore, letters and numbers. Inner Classes will always start with a dollar symbol ($) and must obey the class name rules described previously.
Note: the pattern \w is the XSD pattern for letters and digits (it does not includes the underscore symbol (_))
Hope this help.