In Java I create a string that uses unicode and overline because I am trying to display square roots of numbers. I need to know the length of the string for some formatting
the usual methods for finding string length seem to fail
They don't fail, the report the string lenght as number of Unicode characters [*]. If you need another behaviour, you need to define clearly what you mean by "string length".
When you are interested in string lengths for displaying purposes, then usually your are interested in counting pixels (or some other logical/physical unit), and that's responsability of the display layer (to begin with, you might have different widths for different characters, if the font is not monospaced).
But if you're just interested in counting the number of graphemes ("a minimally distinctive unit of writing in the context of a particular writing system"), here's a nice guide with code and examples. Copying-trimming-pasting the relevant code from there, we'd have something like this:
public static int getGraphemeCount(String text) {
int graphemeCount = 0;
BreakIterator graphemeCounter = BreakIterator.getCharacterInstance();
graphemeCounter.setText(text);
while (graphemeCounter.next() != BreakIterator.DONE)
graphemeCount++;
return graphemeCount;
}
Bear in mind: the above uses the default locale
. A more flexible and robust method would, eg, receive an explicit locale
as argument and invoke BreakIterator.getCharacterInstance(locale) instead
[*] To be precise, as pointed out in comments, String.length()
counts Java characters, which are are actually code-units in a UTF-16 encoding. This is equivalent to counting Unicode characters only if we are inside the BMP.