use docker for google cloud data flow dependencies

后端 未结 3 1586
醉酒成梦
醉酒成梦 2021-01-19 01:49

I am interested in using Google cloud Dataflow to parallel process videos. My job uses both OpenCV and tensorflow. Is it possible to just run the workers inside a docker ins

相关标签:
3条回答
  • 2021-01-19 02:21

    One solution is to issue the pip install commands through the setup.py option listed for Non-Python Dependencies.

    Doing this will download the manylinux wheel instead of the source distribution that the requirements file processing will stage.

    0 讨论(0)
  • 2021-01-19 02:26

    If you have a large number of videos you will have to incur the large startup cost regardless. Thus is the nature of Grid Computing in general.

    The other side of this is that you could use larger machines under the job than the n1-standard-1 machines, thus amortizing the cost of the download across less machines that could potentially process more videos at once if the processing was coded correctly.

    0 讨论(0)
  • It is not possible to modify or switch the default Dataflow worker container. You need to install the dependencies according to the documentation.

    0 讨论(0)
提交回复
热议问题