Application hangs in SysUtils -> DoneMonitorSupport on exit

前端 未结 4 1073
南笙
南笙 2020-12-30 03:36

I am writing a very thread intensive application that hangs when it exits.

I\'ve traced into the system units and found the place where the program enters an infinit

相关标签:
4条回答
  • 2020-12-30 04:18

    I could reproduce your problem using the example provided by Cosmin. I could also solve the problem by simply freeing the SyncObj after all threads are done.

    As I have no access to your code, I cannot say more, but probably some object instance used by TMonitor isn't freed.

    0 讨论(0)
  • 2020-12-30 04:22

    I've been looking at how the TMonitor locks are implemented, and I finally made an interesting discovery. For a bit of drama, I'll first tell you how the locks work.

    When you call any TMonitor function on an TObject, a new instance of the TMonitor record is created and that instance is assigned to a MonitorFld inside the object itself. This assignment is made in a thread-safe way, using InterlockedCompareExchangePointer. Because of this trick the TObject only contains one pointer-size amount of data for the support of TMonitor, it doesn't contain the full TMonitor structure. And that's a good thing.

    This TMonitor structure contains a number of records. We'll start with the FLockCount: Integer field. When the first thread uses TMonitor.Enter() on any object, this combined lock-counter field will have the value ZERO. Again using a InterlockedCompareExchange method the lock is acquired and the counter is initiated. There will be no locking for the calling thread, no context-switch since this is all done in-process.

    When the second thread tries to TMonitor.Enter() the same object, it's first attempt to lock will fail. When that happens Delphi follows two strategies:

    • If the developer used TMonitor.SetSpinCount() to set a number of "spins", then Delphi will do a busy-wait loop, spinning the given number of times. That's very nice for tiny locks because it allows acquiring the lock without doing a context-switch.
    • If the spin-count expires (or there's no spin-count, and by default the spin count zero), TMonitor.Enter() will initiate a Wait on the event returned by TMonitor.GetEvent(). In other words it will not busy-wait wasting CPU cycles. Remember the TMonitor.GetEvent() because that's very important.

    Let's say we've got a thread that acquired the lock and a thread that tried to acquire the lock but is now waiting on the event returned by TMonitor.GetEvent. When the first thread calls TMonitor.Exit() it will notice (via the FLockCount field) that there is at least one other thread blocking. So it immediately pulses what should normally be the previously allocated event (calls TMonitor.GetEvent()). But since the two threads, the one that calls TMonitor.Exit() and the one that called TMonitor.Enter() might actually call TMonitor.GetEvent() at the same time, tehre are a couple more tricks inside TMonitor.GetEvent() to make sure that only one event is allocated, irrelevant of the order of operations.

    For a few more fun moments we'll now delve into the way the TMonitor.GetEvent() works. This thing lives inside the System unit (you know, the one we can't recompile to play with), but it turns out it delegates the duty of actually allocated the Event to an other unit, through the System.MonitorSupport pointer. That points to a record of type TMonitorSupport that declares 5 function pointers:

    • NewSyncObject - allocates a new Event for Synchronization purposes
    • FreeSyncObject - deallocates the Event allocated for Synchronization purposes
    • NewWaitObject - allocates a new Event for Wait operations
    • FreeWaitObject - deallocates that Wait event
    • WaitAndOrSignalObject - well.. waits or signals.

    It also turns out that the objects returned by the NewXYZ functions could be anything, because they're only used for the call to WaitXYZ and for the corresponding call to FreeXyzObject. The way those functions are implemented in SysUtils is designed to provide those locks with a minimum amount of locking and context-switching; Because of that the objects themselves (returned by NewSyncObject and NewWaitObject) are not directly the Events returned by CreateEvent(), but pointers to records in the SyncEventCacheArray. It goes even further, actual Windows Events are not created until required. Because of that the records in the SyncEventCacheArray contains a couple of records:

    • TSyncEventItem.Lock - this tells Delphi rather the Lock is being used for anything right now or not and
    • TSyncEventItem.Event - this holds the actual Event that'll be used for synchronization, if waiting is required.

    When the application terminates, the SysUtils.DoneMonitorSupport goes over all the records in the SyncEventCacheArray and waits for the Lock to become ZERO, ie, waits for the lock to stop being used by anything. Theoretically, as long as that lock is NOT Zero, at least one thread out there might be using the lock - so the sane thing to do would be to wait, in order to NOT cause AccessViolations errors. And we finally got to our current question: HANGING in SysUtils.DoneMonitorSupport

    Why an application might Hang in SysUtils.DoneMonitorSupport even if all it's threads terminated properly?

    Because at least one Event allocated using any one of NewSyncObject or NewWaitObject was not freed using it's corresponding FreeSyncObject or FreeWaitObject. And we go back to the TMonitor.GetEvent() routine. The Event it allocates is saved in the TMonitor record that corresponds to the object that was used for TMonitor.Enter(). The pointer to that record is only kept in that object's instance data, and is kept there for the life of the application. Searching for the name of the field, FLockEvent, we find this in the System.pas file:

    procedure TMonitor.Destroy;
    begin
      if (MonitorSupport <> nil) and (FLockEvent <> nil) then
        MonitorSupport.FreeSyncObject(FLockEvent);
      Dispose(@Self);
    end;
    

    and a call to that record-destructor in here: procedure TObject.CleanupInstance.

    In other words, the final sync-event is only released when the object that was used for synchronization is freed!

    Answer to OP's question:

    The application hangs because at least one OBJECT that was used for TMonitor.Enter() was not freed.

    Possible solutions:

    Unfortunately I don't like this. It's not right, I mean the penalty for not freeing a small object should be a small memory leak, not a hanging application! This is especially bad for Service applications where a service might simply hang for ever, not fully shut down but unable to respond to any request.

    The solutions for the Delphi team? They should NOT hang in the finalization code of the SysUtils unit, no-matter-what. They should probably ignore the Lock and move to closing the Event handle. At that stage (finalization of the SysUtils unit), if there's still code running in some thread, it's in a real bad shape as most of the units got finalized, it's not running in the environment it was designed to run in.

    For the delphi users? We can replace the MonitorSupport with our own version, one that doesn't do those extensive tests at finalization time.

    0 讨论(0)
  • 2020-12-30 04:34

    I've worked around the bug in the following way:

    Copy System.SysUtils, InterlockedAPIs.inc and EncodingData.inc to my application directory and alter the following code in System.SysUtils:

      procedure CleanEventList(var EventCache: array of TSyncEventItem);
      var
        I: Integer;
      begin
        for I := Low(EventCache) to High(EventCache) do
        begin
          if InterlockedCompareExchange(EventCache[I].Lock, 1, 0) = 0 then
             DeleteSyncWaitObj(EventCache[I].Event);
          //repeat until InterlockedCompareExchange(EventCache[I].Lock, 1, 0) = 0;
          //DeleteSyncWaitObj(EventCache[I].Event);
        end;
      end;
    

    I also added this check at the top of System.SysUtils to remind me to update the System.SysUtils file if I change Delphi versions:

    {$IFNDEF VER230}
    !!!!!!!!!!!!!!!!
    You need to update this unit to fix the bug at line 19868
    See http://stackoverflow.com/questions/14217735/application-hangs-in-sysutils-donemonitorsupport-on-exit
    !!!!!!!!!!!!!!!!
    {$ENDIF}
    

    After these changes my application shuts down correctly.

    Note: I tried adding "ReportMemoryLeaksOnShutdown" as LU RD suggested, but on shutdown my app entered a race condition popping up numerous runtime error dialogs. A similar thing happens when I try EurekaLog's memory leak functionality.

    0 讨论(0)
  • 2020-12-30 04:39

    In Delphi XE5, Embarcadero solved this by adding (Now - Start > 1 / MSecsPerDay) or to the repeat until loop in CleanEventList so that it will give up after 1 millisecond. It then deletes the event regardless of whether Lock was 0.

    0 讨论(0)
提交回复
热议问题