Removing duplicate files with Powershell

后端 未结 4 2064
渐次进展
渐次进展 2021-02-14 21:39

I have several thousand duplicate files (jar files as an example) that I\'d like to use powershell to

  1. Search through the file system recursively
  2. Find the
4条回答
  •  太阳男子
    2021-02-14 22:23

    Instead of just remove your duplicates files, you can replace by a shortcut

    #requires -version 3
    <#
        .SYNOPSIS
        Script de nettoyage des doublons
        .DESCRIPTION
        Cherche les doublons par taille, compare leur CheckSum MD5 et les regroupes par Taille et MD5
        peut remplacer chacun des doubles par un lien vers le 1er fichier, l'original
    
        .PARAMETER Path
        Chemin ou rechercher les doublon
    
        .PARAMETER ReplaceByShortcut
        si specifier alors les doublons seront remplacé
    
        .PARAMETER MinLength
        ignore les fichiers inferieure a cette taille (en Octets)
    
        .EXAMPLE
        .\Clean-Duplicate '\\dfs.adds\donnees\commun'
    
        .EXAMPLE
        recherche les doublon de 10Ko et plus
        .\Clean-Duplicate '\\dfs.adds\donnees\commun' -MinLength 10000
    
        .EXAMPLE
        .\Clean-Duplicate '\\dpm1\d$\Coaxis\Logiciels' -ReplaceByShortcut
    #>
    [CmdletBinding()]
    param (
        [string]$Path = '\\Contoso.adds\share$\path\data',
        [switch]$ReplaceByShortcut = $false,
        [int]$MinLength = 10*1024*1024 # 10 Mo
    )
    
    $version = '1.0'
    
    function Create-ShortCut ($ShortcutPath, $shortCutName, $Target) {
        $link = "$ShortcutPath\$shortCutName.lnk"
        $WshShell = New-Object -ComObject WScript.Shell
        $Shortcut = $WshShell.CreateShortcut($link)
        $Shortcut.TargetPath = $Target
        #$Shortcut.Arguments ="shell32.dll,Control_RunDLL hotplug.dll"
        #$Shortcut.IconLocation = "hotplug.dll,0"
        $Shortcut.Description ="Copy Doublon"
        #$Shortcut.WorkingDirectory ="C:\Windows\System32"
        $Shortcut.Save()
        # write-host -fore Cyan $link -nonewline; write-host -fore Red ' >> ' -nonewline; write-host -fore Yellow $Target 
        return $link
    }
    
    function Replace-ByShortcut {
        Param(
            [Parameter(ValueFromPipeline=$true,ValueFromPipelineByPropertyName=$true)]
                $SameItems
        )
        begin{
            $result = [pscustomobject][ordered]@{
                Replaced = @()
                Gain = 0
                Count = 0
            }
        }
        Process{
            $Original = $SameItems.group[0]
            foreach ($doublon in $SameItems.group) {
                if ($doublon -ne $Original) {
                    $result.Replaced += [pscustomobject][ordered]@{
                        lnk = Create-Shortcut -ShortcutPath $doublon.DirectoryName -shortCutName $doublon.BaseName -Target $Original.FullName
                        target = $Original.FullName
                        size = $doublon.Length
                    }
                    $result.Gain += $doublon.Length
                    $result.Count++
                    Remove-item $doublon.FullName -force
                }
            }
        }
        End{
            $result
        }
    }
    
    function Get-MD5 {
        param (
            [Parameter(Mandatory)]
                [string]$Path
        )
        $HashAlgorithm = New-Object -TypeName System.Security.Cryptography.MD5CryptoServiceProvider
        $Stream = [System.IO.File]::OpenRead($Path)
        try {
            $HashByteArray = $HashAlgorithm.ComputeHash($Stream)
        } finally {
            $Stream.Dispose()
        }
    
        return [System.BitConverter]::ToString($HashByteArray).ToLowerInvariant() -replace '-',''
    }
    
    if (-not $Path) {
        if ((Get-Location).Provider.Name -ne 'FileSystem') {
            Write-Error 'Specify a file system path explicitly, or change the current location to a file system path.'
            return
        }
        $Path = (Get-Location).ProviderPath
    }
    
    $DuplicateFiles = Get-ChildItem -Path $Path -Recurse -File |
        Where-Object { $_.Length -gt $MinLength } |
            Group-Object -Property Length |
                Where-Object { $_.Count -gt 1 } |
                    ForEach-Object {
                        $_.Group |
                            ForEach-Object {
                                $_ | Add-Member -MemberType NoteProperty -Name ContentHash -Value (Get-MD5 -Path $_.FullName)
                            }
                        $_.Group |
                            Group-Object -Property ContentHash |
                                Where-Object { $_.Count -gt 1 }
                    }
    
    $somme = ($DuplicateFiles.group | Measure-Object length -Sum).sum
    write-host "$($DuplicateFiles.group.count) doublons, soit $($somme/1024/1024) Mo" -fore cyan
    
    if ($ReplaceByShortcut) {
        $DuplicateFiles | Replace-ByShortcut
    } else {
        $DuplicateFiles
    }
    

提交回复
热议问题