问题
Goal: Use a script to run through 5 million - 10 million XML files and evaluate their date, if older than 90 days delete the file. The script would be run daily.
Problem: Using powershell Get-ChildItem -recurse, causes the script to lock up and fail to delete any files, I assume this is because of the way Get-ChildItem needs to build the whole array before taking any action on any file.
Solution ?: After lots of research I found that [System.IO.Directory]::EnumerateFiles will be able to take action on items in the array before the array is completely built so that should make things more efficient (https://msdn.microsoft.com/library/dd383458%28v=vs.100%29.aspx). After more testing I found that foreach ($1 in $2)
is more efficient than $1 | % {}
Before I run this new code and potentially crash this server again is there any adjustment anyone can suggest for a more efficient way to script this?
For testing I just created 15,000 x 0.02KB txt files in 15,000 directories with random data in them and ran the below code, I used 90 seconds instead of 90 days on the $date
variable just for the test, it took 6 seconds to delete all the txt files.
$getfiles = [System.IO.Directory]::EnumerateFiles("C:\temp", "*.txt", "AllDirectories")
$date = ([System.DateTime]::Now).AddSeconds(-90)
foreach ($2 in $getfiles) {
if ([System.IO.File]::GetLastWriteTime($2) -le $date) {
[System.IO.File]::Delete($2)
} #if
} #foreach
回答1:
Powershell one-liner that does 100,000 files >= 90 days old.
[IO.Directory]::EnumerateFiles("C:\FOLDER_WITH_FILES_TO_DELETE") |
select -first 100000 | where { [IO.File]::GetLastWriteTime($_) -lt
(Get-Date).AddDays(-90) } | foreach { rm $_ }
or with progress shown:
[IO.Directory]::EnumerateFiles("C:\FOLDER_WITH_FILES_TO_DELETE") |
select -first 100000 | where { [IO.File]::GetLastWriteTime($_) -lt
(Get-Date).AddDays(-90) } | foreach { $c = 0 } { Write-Progress
-Activity "Delete Files" -CurrentOperation $_ -PercentComplete
((++$c/100000)*100); rm $_ }
This works on folders that have a very large number of files. Thanks to my co-worker Doug!
回答2:
You may be able to tweak it a little by filtering the $getfiles
array completely before starting to delete files.
In PowerShell 3.0 and newer you can do this without using the pipeline (which indeed does add some overhead), by using the .Where({})
extension method:
$date = (Get-Date).AddDays(-90)
$files = [System.IO.Directory]::EnumerateFiles("C:\temp", "*.txt", "AllDirectories").Where({[System.IO.File]::GetLastWriteTime($_) -le $date})
foreach($file in $files)
{
[System.IO.File]::Delete($file)
}
Since you don't seem to care about it anyways, a final minuscule optimization may be had be waiwing error handling completely and just call the Windows API directly:
$Kernel32Util = Add-Type -MemberDefinition @'
[DllImport("kernel32", CharSet = CharSet.Unicode, SetLastError = true)]
[return: MarshalAs(UnmanagedType.Bool)]
public static extern bool DeleteFile(string filePath);
'@ -Name 'Kernel32Util' -Namespace 'NativeCode' -PassThru
And then do the same as above with your new external function wrapper instead of [File]::Delete()
:
foreach($file in $files)
{
[void]$Kernel32Util::DeleteFile($file)
}
At this point though, I would probably take a step back and ask the question:
"Am I using the right tool for the job?"
My (personal) answer would be: "Probably not" - time to write a small utility in a compiled language (C#, F#, VB.NET) instead.
PowerShell is super powerful and useful, but at the cost of performance - that's not a bad thing - it's just something worth taking into account when deciding on what tool to use for a specific task :)
回答3:
I ended up with several slightly different codes for different versions of powershell
#If powershell version is >3
$date = ([System.DateTime]::Now).AddDays(-30)
foreach ($2 in ([System.IO.Directory]::EnumerateFiles("D:\Folder to cleanup", "*.*", "AllDirectories").Where({[System.IO.File]::GetLastWriteTime($_) -le $date}))) {
[System.IO.File]::Delete($2)
} #foreach
#IF powershell version is >2.0 <3.0
$date = ([System.DateTime]::Now).AddDays(-30)
foreach ($2 in ([System.IO.Directory]::EnumerateFiles("D:\Folder to cleanup", "*.*", "AllDirectories"))) {
if ([System.IO.File]::GetLastWriteTime($2) -le $date) {
[System.IO.File]::Delete($2)
} #if
} #foreach
#IF powershell version is 2.0
$date = ([System.DateTime]::Now).AddDays(-30)
foreach ($2 in ([System.IO.Directory]::GetFiles("D:\Folder to cleanup", "*.*", "AllDirectories"))) {
if ([System.IO.File]::GetLastWriteTime($2) -le $date) {
[System.IO.File]::Delete($2)
} #if
} #foreach
来源:https://stackoverflow.com/questions/35386674/the-most-efficient-way-to-delete-millions-of-files-based-on-modified-date-in-wi