We have a mySQL DB that only supports utf8. But we are getting some data feeds that require utf8mb4 for storing in mySQL. How can we detect (in Java) if a string will require utf8mb4 charset?
Characters that require utf8mb4 are represented as a surrogate pair in Java, and occupy 2 chars. A simple way to detect them is therefore checking if the length of the string in chars is the same as the number of code points:
boolean requiresMb4(String s) {
int len = s.length();
return len != s.codePointCount(0, len);