How can I safely create a nested directory?

前端 未结 27 2610
旧时难觅i
旧时难觅i 2020-11-22 00:07

What is the most elegant way to check if the directory a file is going to be written to exists, and if not, create the directory using Python? Here is what I tried:

相关标签:
27条回答
  • 2020-11-22 00:48

    In Python 3.4 you can also use the brand new pathlib module:

    from pathlib import Path
    path = Path("/my/directory/filename.txt")
    try:
        if not path.parent.exists():
            path.parent.mkdir(parents=True)
    except OSError:
        # handle error; you can also catch specific errors like
        # FileExistsError and so on.
    
    0 讨论(0)
  • 2020-11-22 00:49

    I found this Q/A and I was initially puzzled by some of the failures and errors I was getting. I am working in Python 3 (v.3.5 in an Anaconda virtual environment on an Arch Linux x86_64 system).

    Consider this directory structure:

    └── output/         ## dir
       ├── corpus       ## file
       ├── corpus2/     ## dir
       └── subdir/      ## dir
    

    Here are my experiments/notes, which clarifies things:

    # ----------------------------------------------------------------------------
    # [1] https://stackoverflow.com/questions/273192/how-can-i-create-a-directory-if-it-does-not-exist
    
    import pathlib
    
    """ Notes:
            1.  Include a trailing slash at the end of the directory path
                ("Method 1," below).
            2.  If a subdirectory in your intended path matches an existing file
                with same name, you will get the following error:
                "NotADirectoryError: [Errno 20] Not a directory:" ...
    """
    # Uncomment and try each of these "out_dir" paths, singly:
    
    # ----------------------------------------------------------------------------
    # METHOD 1:
    # Re-running does not overwrite existing directories and files; no errors.
    
    # out_dir = 'output/corpus3'                ## no error but no dir created (missing tailing /)
    # out_dir = 'output/corpus3/'               ## works
    # out_dir = 'output/corpus3/doc1'           ## no error but no dir created (missing tailing /)
    # out_dir = 'output/corpus3/doc1/'          ## works
    # out_dir = 'output/corpus3/doc1/doc.txt'   ## no error but no file created (os.makedirs creates dir, not files!  ;-)
    # out_dir = 'output/corpus2/tfidf/'         ## fails with "Errno 20" (existing file named "corpus2")
    # out_dir = 'output/corpus3/tfidf/'         ## works
    # out_dir = 'output/corpus3/a/b/c/d/'       ## works
    
    # [2] https://docs.python.org/3/library/os.html#os.makedirs
    
    # Uncomment these to run "Method 1":
    
    #directory = os.path.dirname(out_dir)
    #os.makedirs(directory, mode=0o777, exist_ok=True)
    
    # ----------------------------------------------------------------------------
    # METHOD 2:
    # Re-running does not overwrite existing directories and files; no errors.
    
    # out_dir = 'output/corpus3'                ## works
    # out_dir = 'output/corpus3/'               ## works
    # out_dir = 'output/corpus3/doc1'           ## works
    # out_dir = 'output/corpus3/doc1/'          ## works
    # out_dir = 'output/corpus3/doc1/doc.txt'   ## no error but creates a .../doc.txt./ dir
    # out_dir = 'output/corpus2/tfidf/'         ## fails with "Errno 20" (existing file named "corpus2")
    # out_dir = 'output/corpus3/tfidf/'         ## works
    # out_dir = 'output/corpus3/a/b/c/d/'       ## works
    
    # Uncomment these to run "Method 2":
    
    #import os, errno
    #try:
    #       os.makedirs(out_dir)
    #except OSError as e:
    #       if e.errno != errno.EEXIST:
    #               raise
    # ----------------------------------------------------------------------------
    

    Conclusion: in my opinion, "Method 2" is more robust.

    [1] How can I create a directory if it does not exist?

    [2] https://docs.python.org/3/library/os.html#os.makedirs

    0 讨论(0)
  • 2020-11-22 00:49

    Why not use subprocess module if running on a machine that supports command mkdir with -p option ? Works on python 2.7 and python 3.6

    from subprocess import call
    call(['mkdir', '-p', 'path1/path2/path3'])
    

    Should do the trick on most systems.

    In situations where portability doesn't matter (ex, using docker) the solution is a clean 2 lines. You also don't have to add logic to check if directories exist or not. Finally, it is safe to re-run without any side effects

    If you need error handling:

    from subprocess import check_call
    try:
        check_call(['mkdir', '-p', 'path1/path2/path3'])
    except:
        handle...
    
    0 讨论(0)
  • 2020-11-22 00:50

    Using try except and the right error code from errno module gets rid of the race condition and is cross-platform:

    import os
    import errno
    
    def make_sure_path_exists(path):
        try:
            os.makedirs(path)
        except OSError as exception:
            if exception.errno != errno.EEXIST:
                raise
    

    In other words, we try to create the directories, but if they already exist we ignore the error. On the other hand, any other error gets reported. For example, if you create dir 'a' beforehand and remove all permissions from it, you will get an OSError raised with errno.EACCES (Permission denied, error 13).

    0 讨论(0)
  • 2020-11-22 00:51

    The relevant Python documentation suggests the use of the EAFP coding style (Easier to Ask for Forgiveness than Permission). This means that the code

    try:
        os.makedirs(path)
    except OSError as exception:
        if exception.errno != errno.EEXIST:
            raise
        else:
            print "\nBE CAREFUL! Directory %s already exists." % path
    

    is better than the alternative

    if not os.path.exists(path):
        os.makedirs(path)
    else:
        print "\nBE CAREFUL! Directory %s already exists." % path
    

    The documentation suggests this exactly because of the race condition discussed in this question. In addition, as others mention here, there is a performance advantage in querying once instead of twice the OS. Finally, the argument placed forward, potentially, in favour of the second code in some cases --when the developer knows the environment the application is running-- can only be advocated in the special case that the program has set up a private environment for itself (and other instances of the same program).

    Even in that case, this is a bad practice and can lead to long useless debugging. For example, the fact we set the permissions for a directory should not leave us with the impression permissions are set appropriately for our purposes. A parent directory could be mounted with other permissions. In general, a program should always work correctly and the programmer should not expect one specific environment.

    0 讨论(0)
  • 2020-11-22 00:53

    You can use mkpath

    # Create a directory and any missing ancestor directories. 
    # If the directory already exists, do nothing.
    
    from distutils.dir_util import mkpath
    mkpath("test")    
    

    Note that it will create the ancestor directories as well.

    It works for Python 2 and 3.

    0 讨论(0)
提交回复
热议问题