Taking substrings of a string is a very common string manipulation operation, but I heard that there might be considerable differences in performance/implementation between
In .NET, Substring
is O(n) rather than the O(1) of Java. This is because in .NET, the String object contains all the actual character data itself1 - so taking a substring involves copying all the data within the new substring. In Java, substring
can just create a new object referring to the original char array, with a different starting index and length.
There are pros and cons of each approach:
char[]
being garbage collected. I believe in some cases it can make interop very easy too, internally.There's a little more detail in my strings article.
As for the general question of avoiding performance pitfalls, I think I should have a canned answer ready to cut and paste: make sure your architecture is efficient, and implement it in the most readable way you can. Measure the performance, and optimise where you find bottlenecks.
1 Incidentally, this makes string
very special - it's the only non-array type whose memory footprint varies by instance within the same CLR.
2 For small strings, this is a big win. It's bad enough that there's all the overhead of one object, but when there's an extra array involved as well, a single-character string could take around 36 bytes in Java. (That's a "finger-in-the-air" number - I can't remember the exact object overheads. It will also depend on the VM you're using.)