Windows Azure - Cleaning Up The WADLogsTable

北战南征 提交于 2019-11-27 19:29:47

The data in tables created by Windows Azure Diagnostics isn't deleted automatically.

However, Windows Azure PowerShell Cmdlets contain cmdlets specifically for this case.

PS D:\> help Clear-WindowsAzureLog

NAME Clear-WindowsAzureLog

SYNOPSIS Removes Windows Azure trace log data from a storage account.

SYNTAX Clear-WindowsAzureLog [-DeploymentId ] [-From ] [-To ] [-StorageAccountName ] [-StorageAccountKey ] [-UseD evelopmentStorage] [-StorageAccountCredentials ] []

Clear-WindowsAzureLog [-DeploymentId <String>] [-FromUtc <DateTime>] [-ToUt
c <DateTime>] [-StorageAccountName <String>] [-StorageAccountKey <String>]
[-UseDevelopmentStorage] [-StorageAccountCredentials <StorageCredentialsAcc
ountAndKey>] [<CommonParameters>]

You need to specify -ToUtc parameter, and all logs before that date will be deleted.

If cleanup task needs to be performed on Azure within the worker role, C# cmdlets code can be reused. PowerShell Cmdlets are published under permissive MS Public License.

Basically, there are only 3 files needed without other external dependencies: DiagnosticsOperationException.cs, WadTableExtensions.cs, WadTableServiceEntity.cs.

Updated function of Chriseyre2000. This provides much more performance for those cases where you need to delete many thousands records: search by PartitionKey and chunked step-by-step process. And remember that the best choice it is to run it near storage (in cloud service).

public static void TruncateDiagnostics(CloudStorageAccount storageAccount, 
    DateTime startDateTime, DateTime finishDateTime, Func<DateTime,DateTime> stepFunction)
        var cloudTable = storageAccount.CreateCloudTableClient().GetTableReference("WADLogsTable");

        var query = new TableQuery();
        var dt = startDateTime;
        while (true)
            dt = stepFunction(dt);
            if (dt>finishDateTime)
            var l = dt.Ticks;
            string partitionKey =  "0" + l;
            query.FilterString = TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.LessThan, partitionKey);
            query.Select(new string[] {});
            var items = cloudTable.ExecuteQuery(query).ToList();
            const int chunkSize = 200;
            var chunkedList = new List<List<DynamicTableEntity>>();
            int index = 0;
            while (index < items.Count)
                var count = items.Count - index > chunkSize ? chunkSize : items.Count - index;
                chunkedList.Add(items.GetRange(index, count));
                index += chunkSize;
            foreach (var chunk in chunkedList)
                var batches = new Dictionary<string, TableBatchOperation>();
                foreach (var entity in chunk)
                    var tableOperation = TableOperation.Delete(entity);
                    if (batches.ContainsKey(entity.PartitionKey))
                        batches.Add(entity.PartitionKey, new TableBatchOperation {tableOperation});

                foreach (var batch in batches.Values)

You could just do it based on the timestamp but that would be very inefficient since the whole table would need to be scanned. Here is a code sample that might help where the partition key is generated to prevent a "full" table scan.

Here is a solution that trunctates based upon a timestamp. (Tested against SDK 2.0)

It does use a table scan to get the data but if run say once per day would not be too painful:

    /// <summary>
    /// TruncateDiagnostics(storageAccount, DateTime.Now.AddHours(-1));
    /// </summary>
    /// <param name="storageAccount"></param>
    /// <param name="keepThreshold"></param>
    public void TruncateDiagnostics(CloudStorageAccount storageAccount, DateTime keepThreshold)

            CloudTableClient tableClient = storageAccount.CreateCloudTableClient();

            CloudTable cloudTable = tableClient.GetTableReference("WADLogsTable");

            TableQuery query = new TableQuery();
            query.FilterString = string.Format("Timestamp lt datetime'{0:yyyy-MM-ddTHH:mm:ss}'", keepThreshold);
            var items = cloudTable.ExecuteQuery(query).ToList();

            Dictionary<string, TableBatchOperation> batches = new Dictionary<string, TableBatchOperation>();
            foreach (var entity in items)
                TableOperation tableOperation = TableOperation.Delete(entity);

                if (!batches.ContainsKey(entity.PartitionKey))
                    batches.Add(entity.PartitionKey, new TableBatchOperation());


            foreach (var batch in batches.Values)

        catch (Exception ex)
            Trace.TraceError(string.Format("Truncate WADLogsTable exception {0}", ex), "Error");

Here's my slightly different version of @Chriseyre2000's solution, using asynchronous operations and PartitionKey querying. It's designed to run continuously within a Worker Role in my case. This one may be a bit easier on memory if you have a lot of entries to clean up.

static class LogHelper
    /// <summary>
    /// Periodically run a cleanup task for log data, asynchronously
    /// </summary>
    public static async void TruncateDiagnosticsAsync()
        while ( true )
                // Retrieve storage account from connection-string
                CloudStorageAccount storageAccount = CloudStorageAccount.Parse(
                    CloudConfigurationManager.GetSetting( "CloudStorageConnectionString" ) );

                CloudTableClient tableClient = storageAccount.CreateCloudTableClient();

                CloudTable cloudTable = tableClient.GetTableReference( "WADLogsTable" );

                // keep a weeks worth of logs
                DateTime keepThreshold = DateTime.UtcNow.AddDays( -7 );

                // do this until we run out of items
                while ( true )
                    TableQuery query = new TableQuery();
                    query.FilterString = string.Format( "PartitionKey lt '0{0}'", keepThreshold.Ticks );
                    var items = cloudTable.ExecuteQuery( query ).Take( 1000 );

                    if ( items.Count() == 0 )

                    Dictionary<string, TableBatchOperation> batches = new Dictionary<string, TableBatchOperation>();
                    foreach ( var entity in items )
                        TableOperation tableOperation = TableOperation.Delete( entity );

                        // need a new batch?
                        if ( !batches.ContainsKey( entity.PartitionKey ) )
                            batches.Add( entity.PartitionKey, new TableBatchOperation() );

                        // can have only 100 per batch
                        if ( batches[entity.PartitionKey].Count < 100)
                            batches[entity.PartitionKey].Add( tableOperation );

                    // execute!
                    foreach ( var batch in batches.Values )
                        await cloudTable.ExecuteBatchAsync( batch );

                    Trace.TraceInformation( "WADLogsTable truncated: " + query.FilterString );
            catch ( Exception ex )
                Trace.TraceError( "Truncate WADLogsTable exception {0}", ex.Message );

            // run this once per day
            await Task.Delay( TimeSpan.FromDays( 1 ) );

To start the process, just call this from the OnStart method in your worker role.

// start the periodic cleanup

If you don't care about any of the contents, just delete the table. Azure Diagnostics will just recreate it.
