Windows Workflow 4 Correlation Query includes website instance name in instance key calculation and fails

大城市里の小女人 提交于 2020-01-11 12:38:45

问题


I am trying to host a long running workflow service on Azure but I am having problems with correlation.
I have got the timeToUnload and the timeToPersist set to 0 and I have ticked the "persist before send" in the worklow - this is not a problem with persistence, it is to do with how instance keys are calculated.

When one web server starts a workflow and another then tries to take another action on the workflow, it fails with

System.ServiceModel.FaultException: The execution of an InstancePersistenceCommand was interrupted because the instance key '12e0b449-7a71-812d-977a-ab89864a272f' was not associated to an instance. This can occur because the instance or key has been cleaned up, or because the key is invalid. The key may be invalid if the message it was generated from was sent at the wrong time or contained incorrect correlation data.

I used the wcf service diagnostic to dig into this and I found that it is because the calculation of the instance key includes the website instance name, thus a given workflow instance can only be called back from the same machine that instantiated it (because Azure set a different website instance name on each role instance).

To explain, when I create a new instance of the workflow, I have an activity that gets the workflow instance Guid and then returns that guid and also uses the correlation initializer to set the correlation handle.

I have enabled Service Tracing in web.config so in the Service Trace Viewer I can see the following happening when I instantiate a new instance of the workflow;

<ApplicationData >
    <TraceData >
        <DataItem >
            <TraceRecord Severity ="Information" Channel="Analytic " xmlns="http://schemas.microsoft.com/2004/10/E2ETraceEvent/TraceRecord ">
                <TraceIdentifier >225</ TraceIdentifier>
                <Description >Calculated correlation key '496e3207-fe9d-919f-b1df-f329c5a64934' using values 'key1:10013d62-286e-4a8f-aeb2-70582591cd7f,' in parent scope '{/NewOrbit.ExVerifier.Web_IN_2_Web/Workflow/Application/}Application_default1.xamlx'.</Description >
                <AppDomain >/LM/W3SVC/1273337584/ROOT-1-129811251826070757</AppDomain >
            </TraceRecord >
        </DataItem >
    </TraceData >
</ApplicationData >

The important line is this:

Calculated correlation key '496e3207-fe9d-919f-b1df-f329c5a64934' using values 'key1:10013d62-286e-4a8f-aeb2-70582591cd7f,' in parent scope '{/NewOrbit.ExVerifier.Web_IN_2_Web/Workflow/Application/}Application_default1.xamlx'.

The Guid of this particular workflow instance is 10013d62-286e-4a8f-aeb2-70582591cd7f so the workflow engine calculates an "instance key" from that which is 496e3207-fe9d-919f-b1df-f329c5a64934. I can see the workflow instance with the guid in [System.Activities.DurableInstancing].[InstancesTable] and I can see the instance key in [System.Activities.DurableInstancing].[KeysTable]. So far, so good and if the same server makes a later call to that same workflow, everything works fine. However, if a different server tries to access the workflow, I get the correlation error mentioned above. Once again looking at the diagnostics trace, I can see this:

<TraceData >
    <DataItem >
        <TraceRecord Severity ="Information" Channel="Analytic " xmlns="http://schemas.microsoft.com/2004/10/E2ETraceEvent/TraceRecord ">
            <TraceIdentifier >225</ TraceIdentifier>
            <Description >Calculated correlation key '12e0b449-7a71-812d-977a-ab89864a272f' using values 'key1:10013d62-286e-4a8f-aeb2-70582591cd7f,' in parent scope '{/NewOrbit.ExVerifier.Web_IN_5_Web/Workflow/Application/}Application_default1.xamlx'.                     </Description >
            <AppDomain >/LM/W3SVC/1273337584/ROOT-1-129811251818669004</AppDomain >
        </TraceRecord >
    </DataItem >
</TraceData >

The important line is

Calculated correlation key '12e0b449-7a71-812d-977a-ab89864a272f' using values 'key1:10013d62-286e-4a8f-aeb2-70582591cd7f,' in parent scope '{/NewOrbit.ExVerifier.Web_IN_5_Web/Workflow/Application/}Application_default1.xamlx'.

As you can see, it is the same Guid being passed in but the system includes the name of the website instance in the calculation of the Instance key so it ends up with a completely different instance key.

I have created a completely new project to test this out and found the exact same problem. I feel I must be doing something very simple wrong as I can't find anyone else with the same problem.


回答1:


A few months later and I have found a solution to this problem. The root problem is that Azure names the Web site something different on each role instance; Rather than "Default Web SIte", the web site is called something like NewOrbit.ExVerifier.Web_IN_0_Web (given a namespace for your web project of NewOrbit.ExVerifier.Web). Workflow uses the website name as part of the algorithm used to calculate the instance key, hence the problem.

The solution is, quite simply, to rename the website during role startup so it is called the same thing on all instances. Fixing the root problem rather than handling the consequences and so obvious I never saw it the first time round.

Here is how you can do this (losely based on this: http://blogs.msdn.com/b/tomholl/archive/2011/06/28/hosting-services-with-was-and-iis-on-windows-azure.aspx)

Configure powershell to have elevated access rights so you can make changes after IIS has been configured:

In ServiceDefinition.csdef add a startup task:

<ServiceDefinition name="WasInAzure" xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceDefinition">
  <WebRole name="WebRole1">
      ...
      <Startup>
          <Task commandLine="setup\startup.cmd" executionContext="elevated" />
      </Startup>
  </WebRole>
</ServiceDefinition>

Setup\Startup.cmd should have this content:

powershell -command "set-executionpolicy Unrestricted" >> out.txt

Configure Role OnStart to have admin priviliges

In ServiceDefinition.csdef add this:

<ServiceDefinition name="WasInAzure" xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceDefinition">
  <WebRole name="WebRole1">
  ...
    <Runtime executionContext="elevated" />
  </WebRole>
</ServiceDefinition>

Create a powershell script to rename the web site

Create a setup\RoleStart.ps1 file:

write-host "Begin RoleStart.ps1"
import-module WebAdministration
$siteName = "*" + $args[0] + "*"
Get-WebSite $siteName | Foreach-Object { 
    $site = $_;
    $siteref = "IIS:/Sites/" + $site.Name;
    try {
        Rename-Item $siteref 'MyWebSite'
        write-host $siteName + " was renamed"
    }
    catch
    {
       write-host "Failed to rename " + $siteName + " : " + $error[0]
    }
}
write-host "End RoleStart.ps1"

(replace MyWebSite with whatever you want the website to be called on all the servers).

Run RoleStart.ps1 on role start:

Create or Edit WebRole.cs in the root of your website project and add this code:

public class WebRole : RoleEntryPoint
{
    public override bool OnStart()
    {
        var startInfo = new ProcessStartInfo()
        {
            FileName = "powershell.exe",
            Arguments = @".\setup\rolestart.ps1",
            RedirectStandardOutput = true,
            UseShellExecute=false,
        };
        var writer = new StreamWriter("out.txt");
        var process = Process.Start(startInfo);
        process.WaitForExit();
        writer.Write(process.StandardOutput.ReadToEnd());
        writer.Close();
        return base.OnStart();
    }
}

And that should be it. If you spin up multiple web role instances and connect to them with RDP, you should now be able to see that the website is called the same on all the instances and workflow persistence therefore works.




回答2:


It appears this is a problem with running workflow services in a web role. Looks like the workaround is to run you workflow services in a worker role which doesn't have the same problem.




回答3:


I’m not familiar with Workflow persistence. But others have reported they’ve successfully made SQL Azure work with WF persistence, I would like to suggest you to check http://social.msdn.microsoft.com/Forums/en-US/ssdsgetstarted/thread/2dac9194-0067-4e16-8e95-c15a72cb0069/ and http://www.theworkflowelement.com/2011/05/wf-persistence-on-sql-azure.html to see if they help.

Best Regards,

Ming Xu.




回答4:


We are new to WF and WCF but was wondering if you could build your own Instance Store.

This should give you the ability to override InstanceKey whereby you could calculate your own.

There are quite a few examples on the internet.




回答5:


After decompiling and searching a lot, I finnaly found where the key is Generated.

In the class System.ServiceModel.Channels.CorrelationKey, System.ServiceModel

The method GenerateKeyString does the trick. Now I have to find a way to override this method and make my own generating key algorithm so the same instance can run in multiple web servers with different names.

private static Guid GenerateKey(string keyString)
{
  return new Guid(HashHelper.ComputeHash(Encoding.Unicode.GetBytes(keyString)));
}

private static string GenerateKeyString(ReadOnlyDictionaryInternal<string, string> keyData, string scopeName, string provider)
{
  if (string.IsNullOrEmpty(scopeName))
    throw DiagnosticUtility.ExceptionUtility.ThrowHelperArgument("scopeName", System.ServiceModel.SR.GetString("ScopeNameMustBeSpecified"));
  if (provider.Length == 0)
    throw DiagnosticUtility.ExceptionUtility.ThrowHelperArgument("provider", System.ServiceModel.SR.GetString("ProviderCannotBeEmptyString"));
  StringBuilder stringBuilder1 = new StringBuilder();
  StringBuilder stringBuilder2 = new StringBuilder();
  SortedList<string, string> sortedList = new SortedList<string, string>((IDictionary<string, string>) keyData, (IComparer<string>) StringComparer.Ordinal);
  stringBuilder2.Append(sortedList.Count.ToString((IFormatProvider) NumberFormatInfo.InvariantInfo));
  stringBuilder2.Append('.');
  for (int index = 0; index < sortedList.Count; ++index)
  {
    if (index > 0)
      stringBuilder1.Append('&');
    stringBuilder1.Append(sortedList.Keys[index]);
    stringBuilder1.Append('=');
    stringBuilder1.Append(sortedList.Values[index]);
    stringBuilder2.Append(sortedList.Keys[index].Length.ToString((IFormatProvider) NumberFormatInfo.InvariantInfo));
    stringBuilder2.Append('.');
    stringBuilder2.Append(sortedList.Values[index].Length.ToString((IFormatProvider) NumberFormatInfo.InvariantInfo));
    stringBuilder2.Append('.');
  }
  if (sortedList.Count > 0)
    stringBuilder1.Append(',');
  stringBuilder1.Append(scopeName);
  stringBuilder1.Append(',');
  stringBuilder1.Append(provider);
  stringBuilder2.Append(scopeName.Length.ToString((IFormatProvider) NumberFormatInfo.InvariantInfo));
  stringBuilder2.Append('.');
  stringBuilder2.Append(provider.Length.ToString((IFormatProvider) NumberFormatInfo.InvariantInfo));
  stringBuilder1.Append('|');
  stringBuilder1.Append((object) stringBuilder2);
  return ((object) stringBuilder1).ToString();
}


来源:https://stackoverflow.com/questions/10536983/windows-workflow-4-correlation-query-includes-website-instance-name-in-instance

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!