What is a safe Maximum Stack Size or How to measure use of stack?

后端 未结 6 1767
栀梦
栀梦 2020-12-15 11:12

I have an app with a number of worker threads, one for each core. On a modern 8 core machine, I have 8 of these threads. My app loads a lot of plugins, which also have their

相关标签:
6条回答
  • 2020-12-15 11:45

    Use this to compute the amount of memory committed for the current thread's stack:

    function CommittedStackSize: Cardinal;
    asm
      mov eax,[fs:$4] // base of the stack, from the Thread Environment Block (TEB)
      mov edx,[fs:$8] // address of lowest committed stack page
                      // this gets lower as you use more stack
      sub eax,edx
    end;
    

    Another idea I don't have.

    0 讨论(0)
  • 2020-12-15 11:51

    For the sake of completeness, I am adding a version of the CommittedStackSize function provided in opc0de's answer for determining the amount of used stack that will work both for x86 32- and 64-bit versions of Windows (opc0de's function is for Win32 only).

    opc0de's function queries the address of the base of the stack and the lowest committed stack base from Window's Thread Information Block (TIB). There are two differences among x86 and x64:

    • TIB is pointed to by the FS segment register on Win32, but by the GS on Win64 (see here)
    • The absolute offsets of items in the structure differ (mostly because some items are pointers, i.e. 4 bytes and 8 bytes on Win32/64, respectively)

    Additionally note that there is a small difference in the BASM code, because on x64, abs is required to make the assembler use an absolute offset from the the segment register.

    Therefore, a version that will work on both Win32 and Win64 version looks like this:

    {$IFDEF MSWINDOWS}
    function CommittedStackSize: NativeUInt;
    //NB: Win32 uses FS, Win64 uses GS as base for Thread Information Block.
    asm
     {$IFDEF WIN32}
      mov eax, [fs:04h] // TIB: base of the stack
      mov edx, [fs:08h] // TIB: lowest committed stack page
      sub eax, edx      // compute difference in EAX (=Result)
     {$ENDIF}
     {$IFDEF WIN64}
      mov rax, abs [gs:08h] // TIB: base of the stack
      mov rdx, abs [gs:10h] // TIB: lowest committed stack page
      sub rax, rdx          // compute difference in RAX (=Result)
     {$ENDIF}
    {$ENDIF}
    end;
    
    0 讨论(0)
  • 2020-12-15 11:55

    I remember i FillChar'd all available stack space with zeroes upon init years ago, and counted the contiguous zeroes upon deinit, starting from the end. This yielded a good 'high water mark', provided you send your app through its paces for probe runs.

    I'll dig out the code when i am back nonmobile.

    Update: OK the principle is demonstrated in this (ancient) code:

    {***********************************************************
      StackUse - A unit to report stack usage information
    
      by Richard S. Sadowsky
      version 1.0 7/18/88
      released to the public domain
    
      Inspired by a idea by Kim Kokkonen.
    
      This unit, when used in a Turbo Pascal 4.0 program, will
      automatically report information about stack usage.  This is very
      useful during program development.  The following information is
      reported about the stack:
    
      total stack space
      Unused stack space
      Stack spaced used by your program
    
      The unit's initialization code handles three things, it figures out
      the total stack space, it initializes the unused stack space to a
      known value, and it sets up an ExitProc to automatically report the
      stack usage at termination.  The total stack space is calculated by
      adding 4 to the current stack pointer on entry into the unit.  This
      works because on entry into a unit the only thing on the stack is the
      2 word (4 bytes) far return value.  This is obviously version and
      compiler specific.
    
      The ExitProc StackReport handles the math of calculating the used and
      unused amount of stack space, and displays this information.  Note
      that the original ExitProc (Sav_ExitProc) is restored immediately on
      entry to StackReport.  This is a good idea in ExitProc in case a
      runtime (or I/O) error occurs in your ExitProc!
    
      I hope you find this unit as useful as I have!
    
    ************************************************************)
    
    {$R-,S-} { we don't need no stinkin range or stack checking! }
    unit StackUse;
    
    interface
    
    var
      Sav_ExitProc     : Pointer; { to save the previous ExitProc }
      StartSPtr        : Word;    { holds the total stack size    }
    
    implementation
    
    {$F+} { this is an ExitProc so it must be compiled as far }
    procedure StackReport;
    
    { This procedure may take a second or two to execute, especially }
    { if you have a large stack. The time is spent examining the     }
    { stack looking for our init value ($AA). }
    
    var
      I                : Word;
    
    begin
      ExitProc := Sav_ExitProc; { restore original exitProc first }
    
      I := 0;
      { step through stack from bottom looking for $AA, stop when found }
      while I < SPtr do
        if Mem[SSeg:I] <> $AA then begin
          { found $AA so report the stack usage info }
          WriteLn('total stack space : ',StartSPtr);
          WriteLn('unused stack space: ', I);
          WriteLn('stack space used  : ',StartSPtr - I);
          I := SPtr; { end the loop }
        end
        else
          inc(I); { look in next byte }
    end;
    {$F-}
    
    
    begin
      StartSPtr := SPtr + 4; { on entry into a unit, only the FAR return }
                             { address has been pushed on the stack.     }
                             { therefore adding 4 to SP gives us the     }
                             { total stack size. }
      FillChar(Mem[SSeg:0], SPtr - 20, $AA); { init the stack   }
      Sav_ExitProc := ExitProc;              { save exitproc    }
      ExitProc     := @StackReport;          { set our exitproc }
    end.
    

    (From http://webtweakers.com/swag/MEMORY/0018.PAS.html)

    I faintly remember having worked with Kim Kokkonen at that time, and I think the original code is from him.

    The good thing about this approach is you have zero performance penalty and no profiling operation during the program run. Only upon shutdown the loop-until-changed-value-found code eats up CPU cycles. (We coded that one in assembly later.)

    0 讨论(0)
  • 2020-12-15 11:55

    Whilst I am sure that you can reduce the thread stacksize in your app, I don't think it will address the root cause of the problem. You are using an 8 core machine now, but what happens on a 16 core, or a 32 core etc.

    With 32 bit Delphi you have a maximum address space of 4GB and so this does limit you to some degree. You may well need to use smaller stacks for some or all of your threads, but you will still face problems on a big enough machine.

    If you help your app scale better to larger machines you may need to take one or other of the following steps:

    1. Avoid creating significantly more threads than cores. Use a thread pool architecture that is available to your plug-ins. Without the benefit of the .net environment to make this easy you will be best coding against the Windows thread pool API. That said, there must be a good Delphi wrapper available.
    2. Deal with the memory allocation patterns. If your threads are allocating contiguous blocks in the region of 200MB then this is going to cause undue stress on your allocator. I have found that it is often best to allocate such large amounts of memory in smaller, fixed size blocks. This approach works around the fragmentation problems you are encountering.
    0 讨论(0)
  • 2020-12-15 12:04

    Even if all 8 threads were to come close to using their 1MB of stack, that's only 8MB of virtual memory. IIRC, the default initial stack size for threads is 64K, increasing upon page-faults unless the process thread-stack limit is reached, at which point I assume your process will be stopped with a 'Stack overflow' messageBox :((

    I fear that reducing the process stack limit $MAXSTACKSIZE will not alleviate your fragmentation/paging issue much, if anything. You need more RAM so that the resident page set of your mega-photo-app is bigger & so thrashing reduced.

    How many threads are there, overall, on average, in your process? Task manager can show this.

    Rgds, Martin

    0 讨论(0)
  • 2020-12-15 12:06

    Reducing $MAXSTACKSIZE won't work because Windows will always align thread stack to 1Mb (?).

    One (possible?) way to prevent fragmentation is to reserve (not alloc!) virtual memory (with VirtualAlloc) before creating threads. And release it after the threads are running. This way Windows cannot use the reserved space for the threads so you will have some continuous memory.

    Or you could make your own memory manager for large photo's: reserve a lot virtual memory and alloc memory from this pool by hand. (you need to maintain a list of used and used memory yourself).

    At least, that's a theory, don't know if it really works...

    0 讨论(0)
提交回复
热议问题