Windows Azure - Cleaning Up The WADLogsTable

元气小坏坏 提交于 2019-11-26 22:48:30

问题


I've read conflicting information as to whether or not the WADLogsTable table used by the DiagnosticMonitor in Windows Azure will automatically prune old log entries.

I'm guessing it doesn't, and will instead grow forever - costing me money. :)

If that's the case, does anybody have a good code sample as to how to clear out old log entries from this table manually? Perhaps based on timestamp? I'd run this code from a worker role periodically.


回答1:


The data in tables created by Windows Azure Diagnostics isn't deleted automatically.

However, Windows Azure PowerShell Cmdlets contain cmdlets specifically for this case.

PS D:\> help Clear-WindowsAzureLog

NAME Clear-WindowsAzureLog

SYNOPSIS Removes Windows Azure trace log data from a storage account.

SYNTAX Clear-WindowsAzureLog [-DeploymentId ] [-From ] [-To ] [-StorageAccountName ] [-StorageAccountKey ] [-UseD evelopmentStorage] [-StorageAccountCredentials ] []

Clear-WindowsAzureLog [-DeploymentId <String>] [-FromUtc <DateTime>] [-ToUt
c <DateTime>] [-StorageAccountName <String>] [-StorageAccountKey <String>]
[-UseDevelopmentStorage] [-StorageAccountCredentials <StorageCredentialsAcc
ountAndKey>] [<CommonParameters>]

You need to specify -ToUtc parameter, and all logs before that date will be deleted.

If cleanup task needs to be performed on Azure within the worker role, C# cmdlets code can be reused. PowerShell Cmdlets are published under permissive MS Public License.

Basically, there are only 3 files needed without other external dependencies: DiagnosticsOperationException.cs, WadTableExtensions.cs, WadTableServiceEntity.cs.




回答2:


Updated function of Chriseyre2000. This provides much more performance for those cases where you need to delete many thousands records: search by PartitionKey and chunked step-by-step process. And remember that the best choice it is to run it near storage (in cloud service).

public static void TruncateDiagnostics(CloudStorageAccount storageAccount, 
    DateTime startDateTime, DateTime finishDateTime, Func<DateTime,DateTime> stepFunction)
{
        var cloudTable = storageAccount.CreateCloudTableClient().GetTableReference("WADLogsTable");

        var query = new TableQuery();
        var dt = startDateTime;
        while (true)
        {
            dt = stepFunction(dt);
            if (dt>finishDateTime)
                break;
            var l = dt.Ticks;
            string partitionKey =  "0" + l;
            query.FilterString = TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.LessThan, partitionKey);
            query.Select(new string[] {});
            var items = cloudTable.ExecuteQuery(query).ToList();
            const int chunkSize = 200;
            var chunkedList = new List<List<DynamicTableEntity>>();
            int index = 0;
            while (index < items.Count)
            {
                var count = items.Count - index > chunkSize ? chunkSize : items.Count - index;
                chunkedList.Add(items.GetRange(index, count));
                index += chunkSize;
            }
            foreach (var chunk in chunkedList)
            {
                var batches = new Dictionary<string, TableBatchOperation>();
                foreach (var entity in chunk)
                {
                    var tableOperation = TableOperation.Delete(entity);
                    if (batches.ContainsKey(entity.PartitionKey))
                        batches[entity.PartitionKey].Add(tableOperation);
                    else
                        batches.Add(entity.PartitionKey, new TableBatchOperation {tableOperation});
                }

                foreach (var batch in batches.Values)
                    cloudTable.ExecuteBatch(batch);
            }
        }
}



回答3:


You could just do it based on the timestamp but that would be very inefficient since the whole table would need to be scanned. Here is a code sample that might help where the partition key is generated to prevent a "full" table scan. http://blogs.msdn.com/b/avkashchauhan/archive/2011/06/24/linq-code-to-query-windows-azure-wadlogstable-to-get-rows-which-are-stored-after-a-specific-datetime.aspx




回答4:


Here is a solution that trunctates based upon a timestamp. (Tested against SDK 2.0)

It does use a table scan to get the data but if run say once per day would not be too painful:

    /// <summary>
    /// TruncateDiagnostics(storageAccount, DateTime.Now.AddHours(-1));
    /// </summary>
    /// <param name="storageAccount"></param>
    /// <param name="keepThreshold"></param>
    public void TruncateDiagnostics(CloudStorageAccount storageAccount, DateTime keepThreshold)
    {
        try
        {

            CloudTableClient tableClient = storageAccount.CreateCloudTableClient();

            CloudTable cloudTable = tableClient.GetTableReference("WADLogsTable");

            TableQuery query = new TableQuery();
            query.FilterString = string.Format("Timestamp lt datetime'{0:yyyy-MM-ddTHH:mm:ss}'", keepThreshold);
            var items = cloudTable.ExecuteQuery(query).ToList();

            Dictionary<string, TableBatchOperation> batches = new Dictionary<string, TableBatchOperation>();
            foreach (var entity in items)
            {
                TableOperation tableOperation = TableOperation.Delete(entity);

                if (!batches.ContainsKey(entity.PartitionKey))
                {
                    batches.Add(entity.PartitionKey, new TableBatchOperation());
                }

                batches[entity.PartitionKey].Add(tableOperation);
            }

            foreach (var batch in batches.Values)
            {
                cloudTable.ExecuteBatch(batch);
            }

        }
        catch (Exception ex)
        {
            Trace.TraceError(string.Format("Truncate WADLogsTable exception {0}", ex), "Error");
        }
    }



回答5:


Here's my slightly different version of @Chriseyre2000's solution, using asynchronous operations and PartitionKey querying. It's designed to run continuously within a Worker Role in my case. This one may be a bit easier on memory if you have a lot of entries to clean up.

static class LogHelper
{
    /// <summary>
    /// Periodically run a cleanup task for log data, asynchronously
    /// </summary>
    public static async void TruncateDiagnosticsAsync()
    {
        while ( true )
        {
            try
            {
                // Retrieve storage account from connection-string
                CloudStorageAccount storageAccount = CloudStorageAccount.Parse(
                    CloudConfigurationManager.GetSetting( "CloudStorageConnectionString" ) );

                CloudTableClient tableClient = storageAccount.CreateCloudTableClient();

                CloudTable cloudTable = tableClient.GetTableReference( "WADLogsTable" );

                // keep a weeks worth of logs
                DateTime keepThreshold = DateTime.UtcNow.AddDays( -7 );

                // do this until we run out of items
                while ( true )
                {
                    TableQuery query = new TableQuery();
                    query.FilterString = string.Format( "PartitionKey lt '0{0}'", keepThreshold.Ticks );
                    var items = cloudTable.ExecuteQuery( query ).Take( 1000 );

                    if ( items.Count() == 0 )
                        break;

                    Dictionary<string, TableBatchOperation> batches = new Dictionary<string, TableBatchOperation>();
                    foreach ( var entity in items )
                    {
                        TableOperation tableOperation = TableOperation.Delete( entity );

                        // need a new batch?
                        if ( !batches.ContainsKey( entity.PartitionKey ) )
                            batches.Add( entity.PartitionKey, new TableBatchOperation() );

                        // can have only 100 per batch
                        if ( batches[entity.PartitionKey].Count < 100)
                            batches[entity.PartitionKey].Add( tableOperation );
                    }

                    // execute!
                    foreach ( var batch in batches.Values )
                        await cloudTable.ExecuteBatchAsync( batch );

                    Trace.TraceInformation( "WADLogsTable truncated: " + query.FilterString );
                }
            }
            catch ( Exception ex )
            {
                Trace.TraceError( "Truncate WADLogsTable exception {0}", ex.Message );
            }

            // run this once per day
            await Task.Delay( TimeSpan.FromDays( 1 ) );
        }
    }
}

To start the process, just call this from the OnStart method in your worker role.

// start the periodic cleanup
LogHelper.TruncateDiagnosticsAsync();



回答6:


If you don't care about any of the contents, just delete the table. Azure Diagnostics will just recreate it.



来源:https://stackoverflow.com/questions/6905240/windows-azure-cleaning-up-the-wadlogstable

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!