问题
I am modifying an existing Windows Workflow Foundation project that was previous coded to run everything synchronously. However, as the data set grew this needed to change to meet performance requirements.
What I have:
Inside the workflow I have a parent Sequence Workflow that contains a few elementary workflows that basically set a few services up and prepares them to run. I then have the bulk of the workflow's work, which consists of a ForEach Workflow that operates on a collection of about 15000 items that take about 1-3 seconds per item to process (timings are around 70% CPU, 10% network latency, 20% database querying/access). Obviously this takes WAYYYY too long. I need to improve this time by about a factor of 5 (takes around 5-6 hours, need to get to about 1 hour)
Delima:
I have never worked with Windows Workflows before this project so I very unfamiliar with how to achieve otherwise simple implementations of concurrent execution on a collection.
Ideas:
I read about the different Workflow Activities and decided that a ParallelForEach Workflow Activity would probably be the way to go. My idea was that I would just switch out my ForEach Workflow Activity with the ParallelForEach Workflow activity and achieve concurrency in the way the Parallel.Foreach() works in the Task Parallel Library. Unfortunately, that does not seem to be how the ParallelForEach Workflow Activity is implemented. Instead of scheduling the work to be done on each collection across multiple threads and context switching when another thread was waiting, the ParallelForEach Workflow Activity seems to just put each iteration in a stack and operates on them almost syncrounously, unless the body of the workflow is "Idle" (which I do not believe is the same thing as "waiting" on I/O. It seems to be an explicit state that needs to be set on a workflow activity-per MSDN:
ParallelForEach enumerates its values and schedules the Body for every value it enumerates on. It only schedules the Body. How the body executes depends on whether the Body goes idle. If the Body does not go idle, it executes in a reverse order because the scheduled activities are handled as a stack, the last scheduled activity executes first. For example, if you have a collection of {1,2,3,4}in ParallelForEach and use a WriteLine as the body to write the value out. You have 4, 3, 2, 1 printed out in the console. This is because WriteLine does not go idle so after 4 WriteLine activities got scheduled, they executed using a stack behavior (first in last out).
But if you have activities in the Body that can go idle, like a Receive activity or Delay activity. Then there is no need to wait for them to complete. ParallelForEach goes to the next scheduled body activity and try to execute it. If that activity goes idle too, ParallelForEach moves on again the next body activity.
Where I am now:
When running my "idea" above with the ParallelForEach Workflow Activity, I achieve about the same running time as the normal ForEach Workflow Activity. I was considering making the underlying BeginWorkflow method async but I'm not sure if that will be a good idea or not with how Windows WF operates.
I need your help:
Does anyone have any suggestions on how I can achieve the results that I am trying to get to? Is there another way to implement something that would execute the body of the foreach workflow in parallel on as many threads as possible? I have 8 logical processor and I want to take advantage of all of them sense each iteration of the collection is independent from the others.
Any Ideas??
回答1:
The Workflow runtime is single threaded. To truly do parallel work, you have to manage your own threads (somehow). My guess is that your activities are simply doing their thing in the Execute method and the runtime will only allow one Execute at a time.
Here is the code for a NonblockingNativeActivity class. It has been useful for us, I hope it helps you as well. Use this as the base class for your activities, instead of overriding Execute, override ExecuteNonblocking. You can also override PrepareToExecute and AfterExecute if you need to work with the Workflow runtime but, those will be single threaded.
using System.Text;
using System.Activities.Hosting;
using System.Activities;
using System.Diagnostics;
using System.Threading.Tasks;
using System.Threading;
namespace Sample.Activities
{
/// <summary>
/// Class Non-Blocking Native Activity
/// </summary>
public abstract class NonblockingNativeActivity : NativeActivity
{
private Variable<NoPersistHandle> NoPersistHandle { get; set; }
private Variable<Bookmark> Bookmark { get; set; }
private Task m_Task;
private Bookmark m_Bookmark;
private BookmarkResumptionHelper m_BookmarkResumptionHelper;
/// <summary>
/// Allows the activity to induce idle.
/// </summary>
protected override bool CanInduceIdle
{
get
{
return true;
}
}
/// <summary>
/// Prepars for Execution
/// </summary>
/// <param name="context"></param>
protected virtual void PrepareToExecute(
NativeActivityContext context)
{
}
/// <summary>
/// Executes a Non-blocking Activity
/// </summary>
protected abstract void ExecuteNonblocking();
/// <summary>
/// After Execution Completes
/// </summary>
/// <param name="context"></param>
protected virtual void AfterExecute(
NativeActivityContext context)
{
}
/// <summary>
/// Executes the Activity
/// </summary>
/// <param name="context"></param>
protected override void Execute(NativeActivityContext context)
{
//
// We must enter a NoPersist zone because it looks like we're idle while our
// Task is executing but, we aren't really
//
NoPersistHandle noPersistHandle = NoPersistHandle.Get(context);
noPersistHandle.Enter(context);
//
// Set a bookmark that we will resume when our Task is done
//
m_Bookmark = context.CreateBookmark(BookmarkResumptionCallback);
this.Bookmark.Set(context, m_Bookmark);
m_BookmarkResumptionHelper = context.GetExtension<BookmarkResumptionHelper>();
//
// Prepare to execute
//
PrepareToExecute(context);
//
// Start a Task to do the actual execution of our activity
//
CancellationTokenSource tokenSource = new CancellationTokenSource();
m_Task = Task.Factory.StartNew(ExecuteNonblocking, tokenSource.Token);
m_Task.ContinueWith(TaskCompletionCallback);
}
private void TaskCompletionCallback(Task task)
{
if (!task.IsCompleted)
{
task.Wait();
}
//
// Resume the bookmark
//
m_BookmarkResumptionHelper.ResumeBookmark(m_Bookmark, null);
}
private void BookmarkResumptionCallback(NativeActivityContext context, Bookmark bookmark, object value)
{
var noPersistHandle = NoPersistHandle.Get(context);
if (m_Task.IsFaulted)
{
//
// The task had a problem
//
Console.WriteLine("Exception from ExecuteNonBlocking task:");
Exception ex = m_Task.Exception;
while (ex != null)
{
Console.WriteLine(ex.Message);
ex = ex.InnerException;
}
//
// If there was an exception exit the no persist handle and rethrow.
//
if (m_Task.Exception != null)
{
noPersistHandle.Exit(context);
throw m_Task.Exception;
}
}
AfterExecute(context);
noPersistHandle.Exit(context);
}
//
// TODO: How do we want to handle cancelations? We can pass a CancellationToekn to the task
// so that we cancel the task but, maybe we can do better than that?
//
/// <summary>
/// Abort Activity
/// </summary>
/// <param name="context"></param>
protected override void Abort(NativeActivityAbortContext context)
{
base.Abort(context);
}
/// <summary>
/// Cancels the Activity
/// </summary>
/// <param name="context"></param>
protected override void Cancel(NativeActivityContext context)
{
base.Cancel(context);
}
/// <summary>
/// Registers Activity Metadata
/// </summary>
/// <param name="metadata"></param>
protected override void CacheMetadata(NativeActivityMetadata metadata)
{
base.CacheMetadata(metadata);
this.NoPersistHandle = new Variable<NoPersistHandle>();
this.Bookmark = new Variable<Bookmark>();
metadata.AddImplementationVariable(this.NoPersistHandle);
metadata.AddImplementationVariable(this.Bookmark);
metadata.RequireExtension<BookmarkResumptionHelper>();
metadata.AddDefaultExtensionProvider<BookmarkResumptionHelper>(() => new BookmarkResumptionHelper());
}
}
}
来源:https://stackoverflow.com/questions/31990691/true-concurrency-on-a-collection-in-windows-wf-4-5