I have this kind of input
word w\'ord wo\'rd
I need to convert to uppercase both characters at the starts of the word and right after the <
Not as elegant as @Wiktor Stribizew post above but an attempt to do without regex:
public class HelloWorld{
public static void main(String []args){
String s ="word w'ord wo'r'd";
System.out.println(upperCase(s,'\''));
}
private static int x = 1;
private static String upperCase(String originalString, char delimeter)
{
if(originalString.length()==1)
{
return originalString;
}
int indexOfDelimeter = originalString.indexOf(delimeter);
StringBuilder result = new StringBuilder();
if(indexOfDelimeter<0)
{
return originalString;
}
String newBaseString = originalString.substring(indexOfDelimeter+2);
if(indexOfDelimeter==0)
{
result.append(delimeter).append(Character.toUpperCase(originalString.charAt(indexOfDelimeter+1))).append(newBaseString);
}
else
{
result.append(originalString.substring(0,indexOfDelimeter-1)).append(Character.toUpperCase(originalString.charAt(indexOfDelimeter-1))).append(delimeter).append(Character.toUpperCase(originalString.charAt(indexOfDelimeter+1)));
}
if(indexOfDelimeter<originalString.length())
{
result.append(upperCase( newBaseString,delimeter));
}
return result.toString();
}
}
You need to use Matcher#appendReplacement
in Java to be able to process the match. Here is an example:
String s = "word w'ord wo'rd";
StringBuffer result = new StringBuffer();
Matcher m = Pattern.compile("\\b(\\w)(\\w*)'(\\w(?:'\\w)*)").matcher(s);
while (m.find()) {
m.appendReplacement(result,
m.group(1).toUpperCase()+m.group(2) + "'" + m.group(3).toUpperCase());
}
m.appendTail(result);
System.out.println(result.toString());
// => word W'Ord Wo'Rd
See the Java demo
Java 9+ equivalent (demo):
String s = "wo'rd w'ord wo'r'd";
Matcher m = Pattern.compile("\\b(\\w)(\\w*)'(\\w(?:'\\w)*)").matcher(s);
System.out.println(
m.replaceAll(r -> r.group(1).toUpperCase()+r.group(2) + "'" + r.group(3).toUpperCase())
);
//wo'rd w'ord wo'r'd => Wo'Rd W'Ord Wo'R'D
//word w'ord wo'rd => word W'Ord Wo'Rd
Pattern break-down:
\b
- a leading word boundary(\w)
- Group 1: a single word char(\w*)
- Group 2: zero or more word chars'
- a single quote(\w(?:'\w)*)
- Group 3:
\w
- a word char(?:'\w)*
- zero or more sequences of:
'
- a single quote\w
- a word char.Now, if you want to make the pattern more precise, you can change the \w
that are supposed to match lowercase letters with \p{Ll}
and the \w
that is supposed to match any letter with \p{L}
. The pattern would look like "(?U)\\b(\\p{Ll})(\\p{L}*)'(\\p{Ll}(?:'\\p{Ll})*)"
- however, you risk to leave letters in lowercase (those after '
) if there are uppercase before lowercase ones (like in wo'r'D's
-> Wo'R'D's
). (?U)
is a Pattern.UNICODE_CHARACTER_CLASS
inline modifier that makes \b
word boundary Unicode-aware.