I have an app with a number of worker threads, one for each core. On a modern 8 core machine, I have 8 of these threads. My app loads a lot of plugins, which also have their
Use this to compute the amount of memory committed for the current thread's stack:
function CommittedStackSize: Cardinal;
asm
mov eax,[fs:$4] // base of the stack, from the Thread Environment Block (TEB)
mov edx,[fs:$8] // address of lowest committed stack page
// this gets lower as you use more stack
sub eax,edx
end;
Another idea I don't have.
For the sake of completeness, I am adding a version of the CommittedStackSize
function provided in opc0de's answer for determining the amount of used stack that will work both for x86 32- and 64-bit versions of Windows (opc0de's function is for Win32 only).
opc0de's function queries the address of the base of the stack and the lowest committed stack base from Window's Thread Information Block (TIB). There are two differences among x86 and x64:
FS
segment register on Win32, but by the GS
on Win64 (see here)Additionally note that there is a small difference in the BASM code, because on x64, abs
is required to make the assembler use an absolute offset from the the segment register.
Therefore, a version that will work on both Win32 and Win64 version looks like this:
{$IFDEF MSWINDOWS}
function CommittedStackSize: NativeUInt;
//NB: Win32 uses FS, Win64 uses GS as base for Thread Information Block.
asm
{$IFDEF WIN32}
mov eax, [fs:04h] // TIB: base of the stack
mov edx, [fs:08h] // TIB: lowest committed stack page
sub eax, edx // compute difference in EAX (=Result)
{$ENDIF}
{$IFDEF WIN64}
mov rax, abs [gs:08h] // TIB: base of the stack
mov rdx, abs [gs:10h] // TIB: lowest committed stack page
sub rax, rdx // compute difference in RAX (=Result)
{$ENDIF}
{$ENDIF}
end;
I remember i FillChar'd all available stack space with zeroes upon init years ago, and counted the contiguous zeroes upon deinit, starting from the end. This yielded a good 'high water mark', provided you send your app through its paces for probe runs.
I'll dig out the code when i am back nonmobile.
Update: OK the principle is demonstrated in this (ancient) code:
{***********************************************************
StackUse - A unit to report stack usage information
by Richard S. Sadowsky
version 1.0 7/18/88
released to the public domain
Inspired by a idea by Kim Kokkonen.
This unit, when used in a Turbo Pascal 4.0 program, will
automatically report information about stack usage. This is very
useful during program development. The following information is
reported about the stack:
total stack space
Unused stack space
Stack spaced used by your program
The unit's initialization code handles three things, it figures out
the total stack space, it initializes the unused stack space to a
known value, and it sets up an ExitProc to automatically report the
stack usage at termination. The total stack space is calculated by
adding 4 to the current stack pointer on entry into the unit. This
works because on entry into a unit the only thing on the stack is the
2 word (4 bytes) far return value. This is obviously version and
compiler specific.
The ExitProc StackReport handles the math of calculating the used and
unused amount of stack space, and displays this information. Note
that the original ExitProc (Sav_ExitProc) is restored immediately on
entry to StackReport. This is a good idea in ExitProc in case a
runtime (or I/O) error occurs in your ExitProc!
I hope you find this unit as useful as I have!
************************************************************)
{$R-,S-} { we don't need no stinkin range or stack checking! }
unit StackUse;
interface
var
Sav_ExitProc : Pointer; { to save the previous ExitProc }
StartSPtr : Word; { holds the total stack size }
implementation
{$F+} { this is an ExitProc so it must be compiled as far }
procedure StackReport;
{ This procedure may take a second or two to execute, especially }
{ if you have a large stack. The time is spent examining the }
{ stack looking for our init value ($AA). }
var
I : Word;
begin
ExitProc := Sav_ExitProc; { restore original exitProc first }
I := 0;
{ step through stack from bottom looking for $AA, stop when found }
while I < SPtr do
if Mem[SSeg:I] <> $AA then begin
{ found $AA so report the stack usage info }
WriteLn('total stack space : ',StartSPtr);
WriteLn('unused stack space: ', I);
WriteLn('stack space used : ',StartSPtr - I);
I := SPtr; { end the loop }
end
else
inc(I); { look in next byte }
end;
{$F-}
begin
StartSPtr := SPtr + 4; { on entry into a unit, only the FAR return }
{ address has been pushed on the stack. }
{ therefore adding 4 to SP gives us the }
{ total stack size. }
FillChar(Mem[SSeg:0], SPtr - 20, $AA); { init the stack }
Sav_ExitProc := ExitProc; { save exitproc }
ExitProc := @StackReport; { set our exitproc }
end.
(From http://webtweakers.com/swag/MEMORY/0018.PAS.html)
I faintly remember having worked with Kim Kokkonen at that time, and I think the original code is from him.
The good thing about this approach is you have zero performance penalty and no profiling operation during the program run. Only upon shutdown the loop-until-changed-value-found code eats up CPU cycles. (We coded that one in assembly later.)
Whilst I am sure that you can reduce the thread stacksize in your app, I don't think it will address the root cause of the problem. You are using an 8 core machine now, but what happens on a 16 core, or a 32 core etc.
With 32 bit Delphi you have a maximum address space of 4GB and so this does limit you to some degree. You may well need to use smaller stacks for some or all of your threads, but you will still face problems on a big enough machine.
If you help your app scale better to larger machines you may need to take one or other of the following steps:
Even if all 8 threads were to come close to using their 1MB of stack, that's only 8MB of virtual memory. IIRC, the default initial stack size for threads is 64K, increasing upon page-faults unless the process thread-stack limit is reached, at which point I assume your process will be stopped with a 'Stack overflow' messageBox :((
I fear that reducing the process stack limit $MAXSTACKSIZE will not alleviate your fragmentation/paging issue much, if anything. You need more RAM so that the resident page set of your mega-photo-app is bigger & so thrashing reduced.
How many threads are there, overall, on average, in your process? Task manager can show this.
Rgds, Martin
Reducing $MAXSTACKSIZE won't work because Windows will always align thread stack to 1Mb (?).
One (possible?) way to prevent fragmentation is to reserve (not alloc!) virtual memory (with VirtualAlloc) before creating threads. And release it after the threads are running. This way Windows cannot use the reserved space for the threads so you will have some continuous memory.
Or you could make your own memory manager for large photo's: reserve a lot virtual memory and alloc memory from this pool by hand. (you need to maintain a list of used and used memory yourself).
At least, that's a theory, don't know if it really works...