Remove Azure blob storage contents that are untouch for period of time

99封情书 提交于 2019-12-11 02:25:11

问题


The application I developed basically allows users to upload contents and get stored in Azure Blob Storage.

Since the nature of the contents are for quick sharing between users, many of the contents are quickly become untouched after a period of time. But for some contents can be used over and over again.

In order to stop the unprecedented growth of the size of blob storage, I am planning to write tool that basically find any blobs that aren't used for period of time and delete them off the storage.

If it was standard file system, I can use "Last Access Time" to indicate when the last time file being used. However, I can't seem to find similar property of the blob to determine Last Access Time.

So does anyone ever come across this situation, what would be the best way to achieve this? Or am I too concerned about this?

Any feedback or suggestion are greatly appreciated.

Thank you in advanced.


回答1:


I can only see two ways of handling this:

  1. Front the access to the blob such that they must hit a service to get the blob URL with SAS signature. This way you can count and monitor which blobs are getting accessed. Remove older blobs that have low/no access after some time. This requires turning off public access so people cannot just go around your SAS signature.
  2. Turn on storage analytics and monitor the GET requests. You would have to parse all the GET accesses for a month for example ($logs are updated hourly) and group by resource. If you automated this, it would not be too terrible. This would give you a list of all the resources that had been accessed.



回答2:


If you are using Blob storage then following the approach that Gaurav suggested is your best option. See here for a doc on getting started:

https://azure.microsoft.com/en-us/documentation/articles/storage-analytics/.

Note that our .NET client libraries do include support for parsing log files - you can see a demo of this in our client library unit tests:

https://github.com/Azure/azure-storage-net/search?utf8=%E2%9C%93&q=ListLogs




回答3:


This is much more straightforward now with Azure Blob Storage support for lifecycle management.

Manage the Azure Blob storage lifecycle

Azure Blob storage lifecycle management offers a rich, rule-based policy for GPv2 and Blob storage accounts. Use the policy to transition your data to the appropriate access tiers or expire at the end of the data's lifecycle.

The lifecycle management policy lets you:

  • Transition blobs to a cooler storage tier (hot to cool, hot to archive, or cool to archive) to optimize for performance and cost
  • Delete blobs at the end of their lifecycles
  • Define rules to be run once per day at the storage account level Apply rules to containers or a subset of blobs (using prefixes as filters)




回答4:


You can use block and page blob Properties.LastModifiedUtc to get the last modified date. With Page or Block blob, you would need to use GetBlockBlobReference or GetPageBlobReference API along with FetchAttributes() to get the blob reference and then you can look for LastModifiedUtc.

For example with Block blob here is the code snippet:

CloudBlockBlob blockBlob = container_name.GetBlockBlobReference(uri.ToString());
blockBlob.FetchAttributes();
// blockBlob.Properties.LastModifiedUtc will return the last modified date for the blob.


来源:https://stackoverflow.com/questions/12101780/remove-azure-blob-storage-contents-that-are-untouch-for-period-of-time

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!