I think this is an easy question, but I am not able to find a simple solution (say, less than 10 lines of code :)
I have a String
such as \"thisIs
You may use a regexp with zero-width positive lookahead - it finds uppercase letters but doesn't include them into delimiter:
String s = "thisIsMyString";
String[] r = s.split("(?=\\p{Upper})");
Y(?=X)
matches Y
followed by X
, but doesn't include X
into match. So (?=\\p{Upper})
matches an empty sequence followed by a uppercase letter, and split
uses it as a delimiter.
See javadoc for more info on Java regexp syntax.
EDIT: By the way, it doesn't work with thisIsMyÜberString
. For non-ASCII uppercase letters you need a Unicode uppercase character class instead of POSIX one:
String[] r = s.split("(?=\\p{Lu})");
String[] camelCaseWords = s.split("(?=[A-Z])");
Try this;
static Pattern p = Pattern.compile("(?=\\p{Lu})");
String[] s1 = p.split("thisIsMyFirstString");
String[] s2 = p.split("thisIsMySecondString");
...
A simple scala/java suggestion that does not split at entire uppercase strings like NYC:
def splitAtMiddleUppercase(token: String): Iterator[String] = {
val regex = """[\p{Lu}]*[^\p{Lu}]*""".r
regex.findAllIn(token).filter(_ != "") // did not find a way not to produce empty strings in the regex. Open to suggestions.
}
test with:
val examples = List("catch22", "iPhone", "eReplacement", "TotalRecall", "NYC", "JGHSD87", "interÜber")
for( example <- examples) {
println(example + " -> " + splitAtMiddleUppercase(example).mkString("[", ", ", "]"))
}
it produces:
catch22 -> [catch22]
iPhone -> [i, Phone]
eReplacement -> [e, Replacement]
TotalRecall -> [Total, Recall]
NYC -> [NYC]
JGHSD87 -> [JGHSD87]
interÜber -> [inter, Über]
Modify the regex to cut at digits too.
For anyone that wonders how the Pattern is when the String to split might start with an upper case character:
String s = "ThisIsMyString";
String[] r = s.split("(?<=.)(?=\\p{Lu})");
System.out.println(Arrays.toString(r));
gives: [This, Is, My, String]
This regex will split on Caps, omitting the first. So it should work for camel-case and proper-case.
(?<=.)(?=(\\p{Upper}))
TestText = Test, Text
thisIsATest = this, Is, A, Test