Why is shutil.rmtree() so slow?

陌路散爱 提交于 2019-12-10 16:12:57

问题


I went to check how to remove a directory in Python, and was led to use shutil.rmtree(). It's speed surprised me, as compared to what I'd expect from a rm --recursive. Are there faster alternatives, short of using subprocess module?


回答1:


The implementation does a lot of extra processing:

def rmtree(path, ignore_errors=False, onerror=None):
    """Recursively delete a directory tree.

    If ignore_errors is set, errors are ignored; otherwise, if onerror
    is set, it is called to handle the error with arguments (func,
    path, exc_info) where func is os.listdir, os.remove, or os.rmdir;
    path is the argument to that function that caused it to fail; and
    exc_info is a tuple returned by sys.exc_info(). If ignore_errors
    is false and onerror is None, an exception is raised.

    """
    if ignore_errors:
         def onerror(*args):
              pass
    elif onerror is None:
         def onerror(*args):
              raise
    try:
         if os.path.islink(path):
              # symlinks to directories are forbidden, see bug #1669
              raise OSError("Cannot call rmtree on a symbolic link")
    except OSError:
         onerror(os.path.islink, path, sys.exc_info())
         # can't continue even if onerror hook returns
         return
    names = []
    try:
         names = os.listdir(path)
    except os.error, err:
         onerror(os.listdir, path, sys.exc_info())
    for name in names:
         fullname = os.path.join(path, name)
         try:
              mode = os.lstat(fullname).st_mode
         except os.error:
              mode = 0
         if stat.S_ISDIR(mode):
              rmtree(fullname, ignore_errors, onerror)
         else:
             try:
                 os.remove(fullname)
             except os.error, err:
                 onerror(os.remove, fullname, sys.exc_info())
    try:
         os.rmdir(path)
    except os.error:
         onerror(os.rmdir, path, sys.exc_info()) 

Note the os.path.join() used to create new filenames; string operations do take time. The rm(1) implementation instead uses the unlinkat(2) system call, which doesn't do any additional string operations. (And, in fact, saves the kernel from walking through an entire namei() just to find the common directory, over and over and over again. The kernel's dentry cache is good and useful, but that can still be a fair amount of in-kernel string manipulation and comparisons.) The rm(1) utility gets to bypass all that string manipulation, and just use a file descriptor for the directory.

Furthermore, both rm(1) and rmtree() check the st_mode of every file and directory in the tree; but the C implementation does not need to turn every struct statbuf into a Python object just to perform a simple integer mask operation. I don't know how long this process takes, but it happens once for every file, directory, pipe, symlink, etc. in the directory tree.




回答2:


If you care about speed:

os.system('rm -fr "%s"' % your_dirname)

Apart from that I did not find shutil.rmtree() much slower...of course there is extra overhead on the Python level involved. And apart from that I only believe in such a claim if you provide reasonable numbers.




回答3:


While I do not know what's wrong, you can try other methods, eg remove all the files and then try the directory

for r,d,f in os.walk("path"):
   for files in f:
       os.remove ( os.path.join(r,files) )
   os.removedirs( r ) 


来源:https://stackoverflow.com/questions/5470939/why-is-shutil-rmtree-so-slow

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!