Comparison of substring operation performance between .NET and Java

后端 未结 4 901
囚心锁ツ
囚心锁ツ 2020-12-19 02:52

Taking substrings of a string is a very common string manipulation operation, but I heard that there might be considerable differences in performance/implementation between

相关标签:
4条回答
  • 2020-12-19 03:36

    In .NET, Substring is O(n) rather than the O(1) of Java. This is because in .NET, the String object contains all the actual character data itself1 - so taking a substring involves copying all the data within the new substring. In Java, substring can just create a new object referring to the original char array, with a different starting index and length.

    There are pros and cons of each approach:

    • .NET's approach has better cache coherency, creates fewer objects2, and avoids the situation where one small substring prevents a very large char[] being garbage collected. I believe in some cases it can make interop very easy too, internally.
    • Java's approach makes taking a substring very efficient, and probably some other operations too

    There's a little more detail in my strings article.

    As for the general question of avoiding performance pitfalls, I think I should have a canned answer ready to cut and paste: make sure your architecture is efficient, and implement it in the most readable way you can. Measure the performance, and optimise where you find bottlenecks.


    1 Incidentally, this makes string very special - it's the only non-array type whose memory footprint varies by instance within the same CLR.

    2 For small strings, this is a big win. It's bad enough that there's all the overhead of one object, but when there's an extra array involved as well, a single-character string could take around 36 bytes in Java. (That's a "finger-in-the-air" number - I can't remember the exact object overheads. It will also depend on the VM you're using.)

    0 讨论(0)
  • 2020-12-19 03:45

    Using reflector this is what you get from Substring(Int32, Int32)

    [SecuritySafeCritical, TargetedPatchingOptOut("Performance critical to inline across NGen image boundaries")]
    public string Substring(int startIndex, int length)
    {
        return this.InternalSubStringWithChecks(startIndex, length, false);
    }
    

    if you keep on going inside the last call is to an

    internal static unsafe void wstrcpy(char* dmem, char* smem, int charCount)
    

    that copies the chars using pointers. The complete code actually looks big but you won't see how fast or slow it is until you run it and benchmark it.

    0 讨论(0)
  • 2020-12-19 03:48

    According to this not really : C# Substring

    0 讨论(0)
  • 2020-12-19 03:55

    It really depends on your workload. If you're looping and doing lots of substring calls, then you might have a problem. For the SO post you're referring to, I doubt it would ever be a problem. With that attitude, however, you could always wind up in a "death by a thousand paper cuts" situation. In the SO post you refer to, we have the following:

    String after = before.Substring(0, 1).ToUpper() + before.Substring(1);
    

    Assuming the compiler doesn't do some crazy optimizations, this will create at least four new strings (2 Substring calls, a ToUpper call, and the concatenation). Substring is implemented exactly as you'd expect (string copy), but three of those strings allocated above will quickly become garbage. Doing a lot of this will create unnecessary memory pressure. I say "unnecessary" because you can probably come up with a more economical solution with only a little more time investment.

    In the end, the profiler is your best friend :)

    0 讨论(0)
提交回复
热议问题