RegEx, StringBuilder and Large Object Heap Fragmentation

后端 未结 3 2133
悲&欢浪女
悲&欢浪女 2021-02-02 01:28

How can I run lots of RegExes (to find matches) in big strings without causing LOH fragmentation?

It\'s .NET Framework 4.0 so I\'m using StringBuilder so it

3条回答
  •  死守一世寂寞
    2021-02-02 01:42

    OK, here is my attempt solve this problem in a fairly generic way but with some obvious limitations. Since I haven't seen this advice anywhere and everyone is whining about LOH Fragmentation I wanted to share the code to confirm that my design and assumptions are correct.

    Theory:

    1. Create a shared massive StringBuilder (this is to store the big strings that read from we read from streams) - new StringBuilder(ChunkSize * 5);
    2. Create a massive String (has to be bigger than max. accepted size), should be initialized with empty space. - new string(' ', ChunkSize * 10);
    3. Pin string object to memory so GC will not mess with it. GCHandle.Alloc(pinnedText, GCHandleType.Pinned). Even though LOH objects are normally pinned this seems to improve the performance. Maybe because of unsafe code
    4. Read stream into shared StringBuilder and then unsafe copy it to pinnedText by using indexers
    5. Pass the pinnedText to RegEx

    With this implementation the code below works just like there is no LOH allocation. If I switch to new string(' ') allocations instead of using a static StringBuilder or use StringBuilder.ToString() code can allocate 300% less memory before crashing with outofmemory exception

    I also confirmed the results with a memory profiler, that there is no LOH fragmentation in this implementation. I still don't understand why RegEx doesn't cause any unexpected problems. I also tested with different and expensive RegEx patterns and results are same, no fragmentation.

    Code:

    http://pastebin.com/ZuuBUXk3

    using System;
    using System.Collections.Generic;
    using System.Runtime.InteropServices;
    using System.Text;
    using System.Text.RegularExpressions;
    
    namespace LOH_RegEx
    {
        internal class Program
        {
            private static List storage = new List();
            private const int ChunkSize = 100000;
            private static StringBuilder _sb = new StringBuilder(ChunkSize * 5);
    
    
            private static void Main(string[] args)
            {
                var pinnedText = new string(' ', ChunkSize * 10);
                var sourceCodePin = GCHandle.Alloc(pinnedText, GCHandleType.Pinned);
    
                var rgx = new Regex("A", RegexOptions.CultureInvariant | RegexOptions.Compiled);
    
                try
                {
    
                    for (var i = 0; i < 30000; i++)
                    {                   
                        //Simulate that we read data from stream to SB
                        UpdateSB(i);
                        CopyInto(pinnedText);                   
                        var rgxMatch = rgx.Match(pinnedText);
    
                        if (!rgxMatch.Success)
                        {
                            Console.WriteLine("RegEx failed!");
                            Console.ReadLine();
                        }
    
                        //Extra buffer to fragment LoH
                        storage.Add(new string('z', 50000));
                        if ((i%100) == 0)
                        {
                            Console.Write(i + ",");
                        }
                    }
                }
                catch (Exception ex)
                {
                    Console.WriteLine(ex.ToString());
                    Console.WriteLine("OOM Crash!");
                    Console.ReadLine();
                }
            }
    
    
            private static unsafe void CopyInto(string text)
            {
                fixed (char* pChar = text)
                {
                    int i;
                    for (i = 0; i < _sb.Length; i++)
                    {
                        pChar[i] = _sb[i];
                    }
    
                    pChar[i + 1] = '\0';
                }
            }
    
            private static void UpdateSB(int extraSize)
            {
                _sb.Remove(0,_sb.Length);
    
                var rnd = new Random();
                for (var i = 0; i < ChunkSize + extraSize; i++)
                {
                    _sb.Append((char)rnd.Next(60, 80));
                }
            }
        }
    }
    

提交回复
热议问题