Comparison of substring operation performance between .NET and Java

后端 未结 4 893
囚心锁ツ
囚心锁ツ 2020-12-19 02:52

Taking substrings of a string is a very common string manipulation operation, but I heard that there might be considerable differences in performance/implementation between

4条回答
  •  囚心锁ツ
    2020-12-19 03:36

    In .NET, Substring is O(n) rather than the O(1) of Java. This is because in .NET, the String object contains all the actual character data itself1 - so taking a substring involves copying all the data within the new substring. In Java, substring can just create a new object referring to the original char array, with a different starting index and length.

    There are pros and cons of each approach:

    • .NET's approach has better cache coherency, creates fewer objects2, and avoids the situation where one small substring prevents a very large char[] being garbage collected. I believe in some cases it can make interop very easy too, internally.
    • Java's approach makes taking a substring very efficient, and probably some other operations too

    There's a little more detail in my strings article.

    As for the general question of avoiding performance pitfalls, I think I should have a canned answer ready to cut and paste: make sure your architecture is efficient, and implement it in the most readable way you can. Measure the performance, and optimise where you find bottlenecks.


    1 Incidentally, this makes string very special - it's the only non-array type whose memory footprint varies by instance within the same CLR.

    2 For small strings, this is a big win. It's bad enough that there's all the overhead of one object, but when there's an extra array involved as well, a single-character string could take around 36 bytes in Java. (That's a "finger-in-the-air" number - I can't remember the exact object overheads. It will also depend on the VM you're using.)

提交回复
热议问题