uWSGI and joblib Semaphore: Joblib will operate in serial mode

前端 未结 3 845
逝去的感伤
逝去的感伤 2021-02-10 14:48

I\'m running joblib in a Flask application living inside a Docker container together with uWSGI (started with threads enabled) which is started by supervisord.

The start

相关标签:
3条回答
  • 2021-02-10 15:11

    It seems that semaphoring is not enabled on your image: Joblib checks for multiprocessing.Semaphore() and it only root have read/write permission on shared memory in /dev/shm. Have a look to this question and this answer.

    This is run in one of my containers.

    $ ls -ld /dev/shm
    drwxrwxrwt 2 root root 40 Feb 19 15:23 /dev/shm
    

    If you are running as non-root, you should change the permission on /dev/shm. To set the correct permissions, you need to modify the /etc/fstab in you Docker image:

    none /dev/shm tmpfs rw,nosuid,nodev,noexec 0 0
    
    0 讨论(0)
  • 2021-02-10 15:23

    Well, I did find an answer to my problem. It solves the issue in terms of being able to run a joblib dependent library with supervisor and nginx in docker. However, it is not very satisfying. Thus, I won't accept my own answer, but I am posting it here in case other people have the same problem and need to find an okayish fix.

    The solution is replacing uWSGI by gunicorn. Well, at least I know now whose fault it is. I would still appreciate an answer that solves the issue using uWSGI instaed of gunicorn.

    0 讨论(0)
  • 2021-02-10 15:29

    This was quite a rabbit hole.

    The joblib issues page on Github has similar posts of joblib failing with Uwsgi. But most are for the older multiprocessing backend. The new loky backend was supposed to solve these issues.

    There was PR for the multiprocessing backend that solved this issue for uwsgi:

    joblib.Parallel(n_jobs=4,backend="multiprocessing")(joblib.delayed(sqrt)(i ** 2) for i in range(10))
    

    But it failed sometimes randomly and fell back to the same issue that the PR above tried to solve.

    Further digging revealed that the present backend loky parallelizes on processes by default (docs). But these processes dont have shared memory access and so need serialized and queued channels. This is probably the reason why uWSGI fails and gunicorn works.

    So I tried switching to threads instead of processes:

    joblib.Parallel(n_jobs=4,prefer="threads")(joblib.delayed(sqrt)(i ** 2) for i in range(10))
    

    And it works :)

    0 讨论(0)
提交回复
热议问题