How do I check whether a file exists without exceptions?

后端 未结 30 1952
北海茫月
北海茫月 2020-11-21 05:07

How do I check if a file exists or not, without using the try statement?

30条回答
  •  无人及你
    2020-11-21 05:30

    Although almost every possible way has been listed in (at least one of) the existing answers (e.g. Python 3.4 specific stuff was added), I'll try to group everything together.

    Note: every piece of Python standard library code that I'm going to post, belongs to version 3.5.3.

    Problem statement:

    1. Check file (arguable: also folder ("special" file) ?) existence
    2. Don't use try / except / else / finally blocks

    Possible solutions:

    1. [Python 3]: os.path.exists(path) (also check other function family members like os.path.isfile, os.path.isdir, os.path.lexists for slightly different behaviors)

      os.path.exists(path)
      

      Return True if path refers to an existing path or an open file descriptor. Returns False for broken symbolic links. On some platforms, this function may return False if permission is not granted to execute os.stat() on the requested file, even if the path physically exists.

      All good, but if following the import tree:

      • os.path - posixpath.py (ntpath.py)

        • genericpath.py, line ~#20+

          def exists(path):
              """Test whether a path exists.  Returns False for broken symbolic links"""
              try:
                  st = os.stat(path)
              except os.error:
                  return False
              return True
          

      it's just a try / except block around [Python 3]: os.stat(path, *, dir_fd=None, follow_symlinks=True). So, your code is try / except free, but lower in the framestack there's (at least) one such block. This also applies to other funcs (including os.path.isfile).

      1.1. [Python 3]: Path.is_file()

      • It's a fancier (and more pythonic) way of handling paths, but
      • Under the hood, it does exactly the same thing (pathlib.py, line ~#1330):

        def is_file(self):
            """
            Whether this path is a regular file (also True for symlinks pointing
            to regular files).
            """
            try:
                return S_ISREG(self.stat().st_mode)
            except OSError as e:
                if e.errno not in (ENOENT, ENOTDIR):
                    raise
                # Path doesn't exist or is a broken symlink
                # (see https://bitbucket.org/pitrou/pathlib/issue/12/)
                return False
        
    2. [Python 3]: With Statement Context Managers. Either:

      • Create one:

        class Swallow:  # Dummy example
            swallowed_exceptions = (FileNotFoundError,)
        
            def __enter__(self):
                print("Entering...")
        
            def __exit__(self, exc_type, exc_value, exc_traceback):
                print("Exiting:", exc_type, exc_value, exc_traceback)
                return exc_type in Swallow.swallowed_exceptions  # only swallow FileNotFoundError (not e.g. TypeError - if the user passes a wrong argument like None or float or ...)
        
        • And its usage - I'll replicate the os.path.isfile behavior (note that this is just for demonstrating purposes, do not attempt to write such code for production):

          import os
          import stat
          
          
          def isfile_seaman(path):  # Dummy func
              result = False
              with Swallow():
                  result = stat.S_ISREG(os.stat(path).st_mode)
              return result
          
      • Use [Python 3]: contextlib.suppress(*exceptions) - which was specifically designed for selectively suppressing exceptions


      But, they seem to be wrappers over try / except / else / finally blocks, as [Python 3]: The with statement states:

      This allows common try...except...finally usage patterns to be encapsulated for convenient reuse.

    3. Filesystem traversal functions (and search the results for matching item(s))

      • [Python 3]: os.listdir(path='.') (or [Python 3]: os.scandir(path='.') on Python v3.5+, backport: [PyPI]: scandir)

        • Under the hood, both use:

          • Nix: [man7]: OPENDIR(3) / [man7]: READDIR(3) / [man7]: CLOSEDIR(3)
          • Win: [MS.Docs]: FindFirstFileW function / [MS.Docs]: FindNextFileW function / [MS.Docs]: FindClose function

          via [GitHub]: python/cpython - (master) cpython/Modules/posixmodule.c

        Using scandir() instead of listdir() can significantly increase the performance of code that also needs file type or file attribute information, because os.DirEntry objects expose this information if the operating system provides it when scanning a directory. All os.DirEntry methods may perform a system call, but is_dir() and is_file() usually only require a system call for symbolic links; os.DirEntry.stat() always requires a system call on Unix but only requires one for symbolic links on Windows.

      • [Python 3]: os.walk(top, topdown=True, onerror=None, followlinks=False)
        • It uses os.listdir (os.scandir when available)
      • [Python 3]: glob.iglob(pathname, *, recursive=False) (or its predecessor: glob.glob)
        • Doesn't seem a traversing function per se (at least in some cases), but it still uses os.listdir


      Since these iterate over folders, (in most of the cases) they are inefficient for our problem (there are exceptions, like non wildcarded globbing - as @ShadowRanger pointed out), so I'm not going to insist on them. Not to mention that in some cases, filename processing might be required.

    4. [Python 3]: os.access(path, mode, *, dir_fd=None, effective_ids=False, follow_symlinks=True) whose behavior is close to os.path.exists (actually it's wider, mainly because of the 2nd argument)

      • user permissions might restrict the file "visibility" as the doc states:

        ...test if the invoking user has the specified access to path. mode should be F_OK to test the existence of path...

      os.access("/tmp", os.F_OK)

      Since I also work in C, I use this method as well because under the hood, it calls native APIs (again, via "${PYTHON_SRC_DIR}/Modules/posixmodule.c"), but it also opens a gate for possible user errors, and it's not as Pythonic as other variants. So, as @AaronHall rightly pointed out, don't use it unless you know what you're doing:

      • Nix: [man7]: ACCESS(2) (!!! pay attention to the note about the security hole its usage might introduce !!!)
      • Win: [MS.Docs]: GetFileAttributesW function

      Note: calling native APIs is also possible via [Python 3]: ctypes - A foreign function library for Python, but in most cases it's more complicated.

      (Win specific): Since vcruntime* (msvcr*) .dll exports a [MS.Docs]: _access, _waccess function family as well, here's an example:

      Python 3.5.3 (v3.5.3:1880cb95a742, Jan 16 2017, 16:02:32) [MSC v.1900 64 bit (AMD64)] on win32
      Type "help", "copyright", "credits" or "license" for more information.
      >>> import os, ctypes
      >>> ctypes.CDLL("msvcrt")._waccess(u"C:\\Windows\\System32\\cmd.exe", os.F_OK)
      0
      >>> ctypes.CDLL("msvcrt")._waccess(u"C:\\Windows\\System32\\cmd.exe.notexist", os.F_OK)
      -1
      

      Notes:

      • Although it's not a good practice, I'm using os.F_OK in the call, but that's just for clarity (its value is 0)
      • I'm using _waccess so that the same code works on Python3 and Python2 (in spite of unicode related differences between them)
      • Although this targets a very specific area, it was not mentioned in any of the previous answers


      The Lnx (Ubtu (16 x64)) counterpart as well:

      Python 3.5.2 (default, Nov 17 2016, 17:05:23)
      [GCC 5.4.0 20160609] on linux
      Type "help", "copyright", "credits" or "license" for more information.
      >>> import os, ctypes
      >>> ctypes.CDLL("/lib/x86_64-linux-gnu/libc.so.6").access(b"/tmp", os.F_OK)
      0
      >>> ctypes.CDLL("/lib/x86_64-linux-gnu/libc.so.6").access(b"/tmp.notexist", os.F_OK)
      -1
      

      Notes:

      • Instead hardcoding libc's path ("/lib/x86_64-linux-gnu/libc.so.6") which may (and most likely, will) vary across systems, None (or the empty string) can be passed to CDLL constructor (ctypes.CDLL(None).access(b"/tmp", os.F_OK)). According to [man7]: DLOPEN(3):

        If filename is NULL, then the returned handle is for the main program. When given to dlsym(), this handle causes a search for a symbol in the main program, followed by all shared objects loaded at program startup, and then all shared objects loaded by dlopen() with the flag RTLD_GLOBAL.

        • Main (current) program (python) is linked against libc, so its symbols (including access) will be loaded
        • This has to be handled with care, since functions like main, Py_Main and (all the) others are available; calling them could have disastrous effects (on the current program)
        • This doesn't also apply to Win (but that's not such a big deal, since msvcrt.dll is located in "%SystemRoot%\System32" which is in %PATH% by default). I wanted to take things further and replicate this behavior on Win (and submit a patch), but as it turns out, [MS.Docs]: GetProcAddress function only "sees" exported symbols, so unless someone declares the functions in the main executable as __declspec(dllexport) (why on Earth the regular person would do that?), the main program is loadable but pretty much unusable
    5. Install some third-party module with filesystem capabilities

      Most likely, will rely on one of the ways above (maybe with slight customizations).
      One example would be (again, Win specific) [GitHub]: mhammond/pywin32 - Python for Windows (pywin32) Extensions, which is a Python wrapper over WINAPIs.

      But, since this is more like a workaround, I'm stopping here.

    6. Another (lame) workaround (gainarie) is (as I like to call it,) the sysadmin approach: use Python as a wrapper to execute shell commands

      • Win:

        (py35x64_test) e:\Work\Dev\StackOverflow\q000082831>"e:\Work\Dev\VEnvs\py35x64_test\Scripts\python.exe" -c "import os; print(os.system('dir /b \"C:\\Windows\\System32\\cmd.exe\" > nul 2>&1'))"
        0
        
        (py35x64_test) e:\Work\Dev\StackOverflow\q000082831>"e:\Work\Dev\VEnvs\py35x64_test\Scripts\python.exe" -c "import os; print(os.system('dir /b \"C:\\Windows\\System32\\cmd.exe.notexist\" > nul 2>&1'))"
        1
        
      • Nix (Lnx (Ubtu)):

        [cfati@cfati-ubtu16x64-0:~]> python3 -c "import os; print(os.system('ls \"/tmp\" > /dev/null 2>&1'))"
        0
        [cfati@cfati-ubtu16x64-0:~]> python3 -c "import os; print(os.system('ls \"/tmp.notexist\" > /dev/null 2>&1'))"
        512
        

    Bottom line:

    • Do use try / except / else / finally blocks, because they can prevent you running into a series of nasty problems. A counter-example that I can think of, is performance: such blocks are costly, so try not to place them in code that it's supposed to run hundreds of thousands times per second (but since (in most cases) it involves disk access, it won't be the case).

    Final note(s):

    • I will try to keep it up to date, any suggestions are welcome, I will incorporate anything useful that will come up into the answer

提交回复
热议问题