Windows Azure - Cleaning Up The WADLogsTable

北战南征 提交于 2019-11-27 19:29:47
corvus

The data in tables created by Windows Azure Diagnostics isn't deleted automatically.

However, Windows Azure PowerShell Cmdlets contain cmdlets specifically for this case.

PS D:\> help Clear-WindowsAzureLog

NAME Clear-WindowsAzureLog

SYNOPSIS Removes Windows Azure trace log data from a storage account.

SYNTAX Clear-WindowsAzureLog [-DeploymentId ] [-From ] [-To ] [-StorageAccountName ] [-StorageAccountKey ] [-UseD evelopmentStorage] [-StorageAccountCredentials ] []

Clear-WindowsAzureLog [-DeploymentId <String>] [-FromUtc <DateTime>] [-ToUt
c <DateTime>] [-StorageAccountName <String>] [-StorageAccountKey <String>]
[-UseDevelopmentStorage] [-StorageAccountCredentials <StorageCredentialsAcc
ountAndKey>] [<CommonParameters>]

You need to specify -ToUtc parameter, and all logs before that date will be deleted.

If cleanup task needs to be performed on Azure within the worker role, C# cmdlets code can be reused. PowerShell Cmdlets are published under permissive MS Public License.

Basically, there are only 3 files needed without other external dependencies: DiagnosticsOperationException.cs, WadTableExtensions.cs, WadTableServiceEntity.cs.

Updated function of Chriseyre2000. This provides much more performance for those cases where you need to delete many thousands records: search by PartitionKey and chunked step-by-step process. And remember that the best choice it is to run it near storage (in cloud service).

public static void TruncateDiagnostics(CloudStorageAccount storageAccount, 
    DateTime startDateTime, DateTime finishDateTime, Func<DateTime,DateTime> stepFunction)
{
        var cloudTable = storageAccount.CreateCloudTableClient().GetTableReference("WADLogsTable");

        var query = new TableQuery();
        var dt = startDateTime;
        while (true)
        {
            dt = stepFunction(dt);
            if (dt>finishDateTime)
                break;
            var l = dt.Ticks;
            string partitionKey =  "0" + l;
            query.FilterString = TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.LessThan, partitionKey);
            query.Select(new string[] {});
            var items = cloudTable.ExecuteQuery(query).ToList();
            const int chunkSize = 200;
            var chunkedList = new List<List<DynamicTableEntity>>();
            int index = 0;
            while (index < items.Count)
            {
                var count = items.Count - index > chunkSize ? chunkSize : items.Count - index;
                chunkedList.Add(items.GetRange(index, count));
                index += chunkSize;
            }
            foreach (var chunk in chunkedList)
            {
                var batches = new Dictionary<string, TableBatchOperation>();
                foreach (var entity in chunk)
                {
                    var tableOperation = TableOperation.Delete(entity);
                    if (batches.ContainsKey(entity.PartitionKey))
                        batches[entity.PartitionKey].Add(tableOperation);
                    else
                        batches.Add(entity.PartitionKey, new TableBatchOperation {tableOperation});
                }

                foreach (var batch in batches.Values)
                    cloudTable.ExecuteBatch(batch);
            }
        }
}

You could just do it based on the timestamp but that would be very inefficient since the whole table would need to be scanned. Here is a code sample that might help where the partition key is generated to prevent a "full" table scan. http://blogs.msdn.com/b/avkashchauhan/archive/2011/06/24/linq-code-to-query-windows-azure-wadlogstable-to-get-rows-which-are-stored-after-a-specific-datetime.aspx

Here is a solution that trunctates based upon a timestamp. (Tested against SDK 2.0)

It does use a table scan to get the data but if run say once per day would not be too painful:

    /// <summary>
    /// TruncateDiagnostics(storageAccount, DateTime.Now.AddHours(-1));
    /// </summary>
    /// <param name="storageAccount"></param>
    /// <param name="keepThreshold"></param>
    public void TruncateDiagnostics(CloudStorageAccount storageAccount, DateTime keepThreshold)
    {
        try
        {

            CloudTableClient tableClient = storageAccount.CreateCloudTableClient();

            CloudTable cloudTable = tableClient.GetTableReference("WADLogsTable");

            TableQuery query = new TableQuery();
            query.FilterString = string.Format("Timestamp lt datetime'{0:yyyy-MM-ddTHH:mm:ss}'", keepThreshold);
            var items = cloudTable.ExecuteQuery(query).ToList();

            Dictionary<string, TableBatchOperation> batches = new Dictionary<string, TableBatchOperation>();
            foreach (var entity in items)
            {
                TableOperation tableOperation = TableOperation.Delete(entity);

                if (!batches.ContainsKey(entity.PartitionKey))
                {
                    batches.Add(entity.PartitionKey, new TableBatchOperation());
                }

                batches[entity.PartitionKey].Add(tableOperation);
            }

            foreach (var batch in batches.Values)
            {
                cloudTable.ExecuteBatch(batch);
            }

        }
        catch (Exception ex)
        {
            Trace.TraceError(string.Format("Truncate WADLogsTable exception {0}", ex), "Error");
        }
    }

Here's my slightly different version of @Chriseyre2000's solution, using asynchronous operations and PartitionKey querying. It's designed to run continuously within a Worker Role in my case. This one may be a bit easier on memory if you have a lot of entries to clean up.

static class LogHelper
{
    /// <summary>
    /// Periodically run a cleanup task for log data, asynchronously
    /// </summary>
    public static async void TruncateDiagnosticsAsync()
    {
        while ( true )
        {
            try
            {
                // Retrieve storage account from connection-string
                CloudStorageAccount storageAccount = CloudStorageAccount.Parse(
                    CloudConfigurationManager.GetSetting( "CloudStorageConnectionString" ) );

                CloudTableClient tableClient = storageAccount.CreateCloudTableClient();

                CloudTable cloudTable = tableClient.GetTableReference( "WADLogsTable" );

                // keep a weeks worth of logs
                DateTime keepThreshold = DateTime.UtcNow.AddDays( -7 );

                // do this until we run out of items
                while ( true )
                {
                    TableQuery query = new TableQuery();
                    query.FilterString = string.Format( "PartitionKey lt '0{0}'", keepThreshold.Ticks );
                    var items = cloudTable.ExecuteQuery( query ).Take( 1000 );

                    if ( items.Count() == 0 )
                        break;

                    Dictionary<string, TableBatchOperation> batches = new Dictionary<string, TableBatchOperation>();
                    foreach ( var entity in items )
                    {
                        TableOperation tableOperation = TableOperation.Delete( entity );

                        // need a new batch?
                        if ( !batches.ContainsKey( entity.PartitionKey ) )
                            batches.Add( entity.PartitionKey, new TableBatchOperation() );

                        // can have only 100 per batch
                        if ( batches[entity.PartitionKey].Count < 100)
                            batches[entity.PartitionKey].Add( tableOperation );
                    }

                    // execute!
                    foreach ( var batch in batches.Values )
                        await cloudTable.ExecuteBatchAsync( batch );

                    Trace.TraceInformation( "WADLogsTable truncated: " + query.FilterString );
                }
            }
            catch ( Exception ex )
            {
                Trace.TraceError( "Truncate WADLogsTable exception {0}", ex.Message );
            }

            // run this once per day
            await Task.Delay( TimeSpan.FromDays( 1 ) );
        }
    }
}

To start the process, just call this from the OnStart method in your worker role.

// start the periodic cleanup
LogHelper.TruncateDiagnosticsAsync();

If you don't care about any of the contents, just delete the table. Azure Diagnostics will just recreate it.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!