问题
I am working on a program to migrate files from potentially big directory structures and many of them (approx. 1 million).
My migration code already works quite well, and I am using a class to iterate to the directory structure, identify the files to migrate them sequentially one after another.
Now I want to make better use of the available CPU resources of the targeted machine, and do those migrations asynchronously grabbing threads from a System.Threading.TThreadPool
to execute these.
I know well about the ITask
interface, and how to make use of TTask
to set up an array of tasks, that will be managed in conjunction with a TThreadPool
instance.
Though setting up a big TArray<ITask>
array, and waiting for completion when all the directories were walked through, just seems to be an inappropriate and inefficient approach (especially in regards of memory consumption).
What I believe I need there is just to have a simple thread safe producer / consumer queue, that grows and shrinks as worker threads are available to consume the tasks, and complete them.
Now I found something that sounds promising these regards at the Emba docs, called a TWorkStealingQueue, but as so often, the documentation is pretty poor and lacks concise examples how to make use of it.
It would boil down to something like that
TMigrationFileWalker = class(TFileWalker)
strict private
var
FPendingMigrationTasks : TArray<ITask>;
function createMigrationTask(const filename : string) : ITask;
strict protected
procedure onHandleFile(const filename : string); override;
public
procedure walkDirectoryTree(const startDir : string); override;
end;
implementation
procedure TMigrationFileWalker.onHandleFile(const filename : string);
var
migrationTask : ITask;
begin
migrationTask := createMigrationTask(filename);
self.FPendingMigrationTasks := self.FPendingMigrationTasks + [migrationTask];
migrationTask.Start();
end;
procedure walkDirectoryTree(const startDir : string);
begin
inherited walkDirectoryTree(startDir);
TTask.WaitForAll(self.FPendingMigrationTasks,SOME_REASONABLE_TIMEOUT);
end;
Of course I could have a thread safe PC queue, and manage a bunch of threads working on it. But the promise there is it works with a thread pool, and I'd like to take advantage of the already available load balancing mechanisms coming with it.
Is anyone around here who already used TWorkStealingQueue
, and can give a short, concise example how that could be used in such scenario as described above? Or at least clarify what's the actual purpose of that class, in case I totally misunderstood that from the naming?
A research about TWorkStealingQueue
didn't yield any better results, than redirecting to the insufficient Embarcadero documentation.
来源:https://stackoverflow.com/questions/51571759/what-is-the-purpose-of-tworkstealingqueue-and-how-to-use-it