Python Subprocess: how/when do they close file?

独自空忆成欢 提交于 2020-04-18 04:01:42

问题


I wonder why subprocesses keep so many files open. I have an example in which some files seem to remain open forever (after the subprocess finishes and even after the program crashes).

Consider the following code:

import aiofiles
import tempfile

async def main():
    return [await fds_test(i) for i in range(2000)]

async def fds_test(index):
    print(f"Writing {index}")
    handle, temp_filename = tempfile.mkstemp(suffix='.dat', text=True)
    async with aiofiles.open(temp_filename, mode='w') as fp:
        await fp.write('stuff')
        await fp.write('other stuff')
        await fp.write('EOF\n')

    print(f"Reading {index}")
    bash_cmd = 'cat {}'.format(temp_filename)
    process = await asyncio.create_subprocess_exec(*bash_cmd.split(), stdout=asyncio.subprocess.DEVNULL, close_fds=True)
    await process.wait()
    print(f"Process terminated {index}")

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

This spawns processes one after the other (sequentially). I expect the number of files simultaneously opened by this to also be one. But it's not the case and at some point I get the following error:

/Users/cglacet/.pyenv/versions/3.8.0/lib/python3.8/subprocess.py in _execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, restore_signals, start_new_session)
   1410             # Data format: "exception name:hex errno:description"
   1411             # Pickle is not used; it is complex and involves memory allocation.
-> 1412             errpipe_read, errpipe_write = os.pipe()
   1413             # errpipe_write must not be in the standard io 0, 1, or 2 fd range.
   1414             low_fds_to_close = []

OSError: [Errno 24] Too many open files

I tried running the same code without the option stdout=asyncio.subprocess.DEVNULL but it still crashes. This answer suggested it might be where the problem comes from and the error also points at the line errpipe_read, errpipe_write = os.pipe(). But it doesn't seem like this is the problem (running without that option gives the same error).

In case you need more information, here is an overview from the output of lsof | grep python:

python3.8 19529 cglacet    7u      REG                1,5        138 12918796819 /private/var/folders/sn/_pq5fxn96kj3m135j_b76sb80000gp/T/tmpuxu_o4mf.dat
# ... 
# ~ 2000 entries later : 
python3.8 19529 cglacet 2002u      REG                1,5        848 12918802386 /private/var/folders/sn/_pq5fxn96kj3m135j_b76sb80000gp/T/tmpcaakgz3f.dat

These are the temporary files that my subprocesses are reading. The rest of the output from lsof seems like legit stuff (libraries opened, like pandas/numpy/scipy/etc.).

Now I have some doubt: maybe the problem comes from aiofiles asynchronous context manager? Maybe it's the one not closing the files and not create_subprocess_exec?

There is a similar question here, but nobody really try to explain/solve the problem (and only suggest increasing the limit) : Python Subprocess: Too Many Open Files. I would really like to understand what is going on, my first goal is not necessarily to temporarily solve the problem (in the future I want to be able to run function fds_test as many times as needed). My goal is to have a function that behave as expected. I probably have to change either my expectation or my code, that's why I ask this question.


As suggested in the comments here, I also tried to run python -m test test_subprocess -m test_close_fds -v which gives:

== CPython 3.8.0 (default, Nov 28 2019, 20:06:13) [Clang 11.0.0 (clang-1100.0.33.12)]
== macOS-10.14.6-x86_64-i386-64bit little-endian
== cwd: /private/var/folders/sn/_pq5fxn96kj3m135j_b76sb80000gp/T/test_python_52961
== CPU count: 8
== encodings: locale=UTF-8, FS=utf-8
0:00:00 load avg: 5.29 Run tests sequentially
0:00:00 load avg: 5.29 [1/1] test_subprocess
test_close_fds (test.test_subprocess.POSIXProcessTestCase) ... ok
test_close_fds (test.test_subprocess.Win32ProcessTestCase) ... skipped 'Windows specific tests'

----------------------------------------------------------------------

Ran 2 tests in 0.142s

OK (skipped=1)

== Tests result: SUCCESS ==

1 test OK.

Total duration: 224 ms
Tests result: SUCCESS

So it seems files should be correctly closed, I'm a bit lost here.


回答1:


The problem doesn't come from create_subprocess_exec the problem in this code is that tempfile.mkstemp() actually opens the file:

mkstemp() returns a tuple containing an OS-level handle to an open file (as would be returned by os.open()) …

I thought it would only create the file. To solve my problem I simply added a call to os.close(handle). Which removes the error but is a bit weird (opens a file twice). So I rewrote it as:

import aiofiles
import tempfile
import uuid


async def main():
    await asyncio.gather(*[fds_test(i) for i in range(10)])

async def fds_test(index):
    dir_name = tempfile.gettempdir()
    file_id = f"{tempfile.gettempprefix()}{uuid.uuid4()}"
    temp_filename = f"{dir_name}/{file_id}.dat"

    async with aiofiles.open(temp_filename, mode='w') as fp:
        await fp.write('stuff')

    bash_cmd = 'cat {}'.format(temp_filename)
    process = await asyncio.create_subprocess_exec(*bash_cmd.split(), close_fds=True)
    await process.wait()


if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

Now I wonder why the error was raised by subprocess and not tempfile.mkstemp, maybe because it subprocess opens so much more files that it makes it unlikely that the temporary file creation is what breaks the limit …



来源:https://stackoverflow.com/questions/60928873/python-subprocess-how-when-do-they-close-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!