Delete a large number (>100K) of files with c# whilst maintaining performance in a web application?

前端 未结 10 2080
[愿得一人]
[愿得一人] 2020-12-30 04:40

I am trying to remove a large number of files from a location (by large I mean over 100000), whereby the action is initated from a web page. Obviously I cou

相关标签:
10条回答
  • 2020-12-30 05:12

    Can you put all your files in the same directory?

    If so, why don't you just call Directory.Delete(string,bool) on the subdir you want to delete?

    If you've already got a list of file paths you want to get rid of, you might actually get better results by moving them to a temp dir then deleting them rather than deleting each file manually.

    Cheers, Florian

    0 讨论(0)
  • 2020-12-30 05:14

    Boot the work out to a worker thread and then return your response to the user.

    I'd flag up a application variable to say that you are doing "the big delete job" to stop running multiple threads doing the same work. You could then poll another page which could give you a progress update of the number of files removed so far too if you wanted to?

    Just a query but why so many files?

    0 讨论(0)
  • 2020-12-30 05:19

    I know it's old thread but in addition to Jan Jongboom answer I propose similar solution which is quite performant and more universal. My solution was built to quickly remove directory structure in DFS with support for long file names (>255 chars). The first difference is in DLL import declaration.

    [DllImport("kernel32.dll", CharSet = CharSet.Unicode, SetLastError = true)]
    static extern IntPtr FindFirstFile(string lpFileName, ref WIN32_FIND_DATA lpFindFileData);
    
    [DllImport("kernel32.dll", CharSet = CharSet.Unicode, SetLastError = true)]
    static extern bool FindNextFile(IntPtr hDindFile, ref WIN32_FIND_DATA lpFindFileData);
    
    [DllImport("kernel32.dll", CharSet = CharSet.Unicode, SetLastError = true)]
    [return: MashalAs(UnmanagedType.Bool]
    static extern bool DeleteFile(string lpFileName)
    
    [DllImport("kernel32.dll", CharSet = CharSet.Unicode, SetLastError = true)]
    [return: MashalAs(UnmanagedType.Bool]
    static extern bool DeleteDirectory(string lpPathName)
    
    [DllImport("kernel32.dll", CharSet = CharSet.Unicode, SetLastError = true)]
    static extern bool FindClose(IntPtr hFindFile);
    
    [DllImport("kernel32.dll", CharSet = CharSet.Unicode, SetLAstError = true)]
    static extern uint GetFileAttributes(string lpFileName);
    
    [DllImport("kernel32.dll", CharSet = CharSet.Unicode, SetLAstError = true)]
    static extern bool SetFileAttributes(string lpFileName, uint dwFileAttributes);
    

    WIN32_FIND_DATA structure is also slightly different:

        [StructLayout(LayoutKind.Sequential, CharSet = CharSet.Unicode), Serializable, BestFitMapping(false)]
        internal struct WIN32_FIND_DATA
        {
            internal FileAttributes dwFileAttributes;
            internal FILETIME ftCreationTime;
            internal FILETIME ftLastAccessTime;
            internal FILETIME ftLastWriteTime;
            internal int nFileSizeHigh;
            internal int nFileSizeLow;
            internal int dwReserved0;
            internal int dwReserved1;
            [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 260)]
            internal string cFileName;
            [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 14)]
            internal string cAlternative;
        }
    

    In order to use long paths the path needs to be prepared as follows:

    public void RemoveDirectory(string directoryPath)
    {
        var path = @"\\?\UNC\" + directoryPath.Trim(@" \/".ToCharArray());
        SearchAndDelete(path);
    }
    

    and here's the main method:

    private void SearchAndDelete(string path)
    {
        var fd = new WIN32_FIND_DATA();
        var found = false;
        var handle = IntPtr.Zero;
        var invalidHandle = new IntPtr(-1);
        var fileAttributeDir = 0x00000010;
        var filesToRemove = new List<string>();
        try
        {
            handle = FindFirsFile(path + @"\*", ref fd);
            if (handle == invalidHandle) return;
            do
            {
                var current = fd.cFileName;
                if (((int)fd.dwFileAttributes & fileAttributeDir) != 0)
                {
                    if (current != "." && current != "..")
                    {
                        var newPath = Path.Combine(path, current);
                        SearchAndDelete(newPath);
                    }
                }
                else
                {
                    filesToRemove.Add(Path.Combine(path, current));
                }
                found = FindNextFile(handle, ref fd);
            } while (found);
        }
        finally
        {
            FindClose(handle);
        }
        try
        {
            object lockSource = new Object();
            var exceptions = new List<Exception>();
            Parallel.ForEach(filesToRemove, file, =>
            {
                var attrs = GetFileAttributes(file);
                attrs &= ~(uint)0x00000002; // hidden
                attrs &= ~(uint)0x00000001; // read-only
                SetFileAttributes(file, attrs);
                if (!DeleteFile(file))
                {
                    var msg = string.Format("Cannot remove file {0}.{1}{2}", file.Replace(@"\\?\UNC", @"\"), Environment.NewLine, new Win32Exception(Marshal.GetLastWin32Error()).Message);
                    lock(lockSource)
                    {
                        exceptions.Add(new Exceptions(msg));
                    }
                }
            });
            if (exceptions.Any())
            {
                throw new AggregateException(exceptions);
            }
        }
        var dirAttr = GetFileAttributes(path);
        dirAttr &= ~(uint)0x00000002; // hidden
        dirAttr &= ~(uint)0x00000001; // read-only
        SetfileAttributtes(path, dirAttr);
        if (!RemoveDirectory(path))
        {
            throw new Exception(new Win32Exception(Marshal.GetLAstWin32Error()));
        }
    }
    

    of course we could go further and store directories in separate list outside of that method and delete them later in another method which could look like this:

    private void DeleteDirectoryTree(List<string> directories)
    {
            // group directories by depth level and order it by level descending
            var data = directories.GroupBy(d => d.Split('\\'),
                d => d,
                (key, dirs) => new
                {
                    Level = key,
                    Directories = dirs.ToList()
                }).OrderByDescending(l => l.Level);
            var exceptions = new List<Exception>();
            var lockSource = new Object();
            foreach (var level in data)
            {
                Parallel.ForEach(level.Directories, dir =>
                {
                    var attrs = GetFileAttributes(dir);
                    attrs &= ~(uint)0x00000002; // hidden
                    attrs &= ~(uint)0x00000001; // read-only
                    SetFileAttributes(dir, attrs);
                    if (!RemoveDirectory(dir))
                    {
                        var msg = string.Format("Cannot remove directory {0}.{1}{2}", dir.Replace(@"\\?\UNC\", string.Empty), Environment.NewLine, new Win32Exception(Marshal.GetLastWin32Error()).Message);
                        lock (lockSource)
                        {
                            exceptions.Add(new Exception(msg));
                        }
                    }
                });
            }
            if (exceptions.Any())
            {
                throw new AggregateException(exceptions);
            }
    }
    
    0 讨论(0)
  • 2020-12-30 05:28

    The best choice (imho) would be to create a seperate process to delete/count the files and check on the progress by polling otherwise you might get problems with browser timeouts.

    0 讨论(0)
  • 2020-12-30 05:28

    Some improvements to speed it up in the back end:

    • Use Directory.EnumerateFiles(..) : this will iterate through files without waiting after all files have been retrieved.

    • Use Parallel.Foreach(..) : this will delete files simultaneously.

    It should be faster but apparently the HTTP request would still be timeout with the large number of files so the back end process should be executed in separate worker thread and notify result back to web client after finishing.

    0 讨论(0)
  • 2020-12-30 05:29

    Do it in a separate thread, or post a message to a queue (maybe MSMQ?) where another application (maybe a windows service) is subscribed to that queue and performs the commands (i.e. "Delete e:\dir*.txt") in it's own process.

    The message should probably just include the folder name. If you use something like NServiceBus and transactional queues, then you can post your message and return immediately as long as the message was posted successfully. If there is a problem actually processing the message, then it'll retry and eventually go on an error queue that you can watch and perform maintenance on.

    0 讨论(0)
提交回复
热议问题