dsl.ContainerOp with python

江枫思渺然 提交于 2021-01-05 10:38:55

问题


What are the options to download .py files into the execution environment?

In this example:

class Preprocess(dsl.ContainerOp):

  def __init__(self, name, bucket, cutoff_year):
    super(Preprocess, self).__init__(
      name=name,
      # image needs to be a compile-time string
      image='gcr.io/<project>/<image-name>/cpu:v1',
      command=['python3', 'run_preprocess.py'],
      arguments=[
        '--bucket', bucket,
        '--cutoff_year', cutoff_year,
        '--kfp'
      ],
      file_outputs={'blob-path': '/blob_path.txt'}
    )

run_preprocess.py file is being called from CLI.

The question is: how to get that file in there?

I have seen this interesting example: https://github.com/benjamintanweihao/kubeflow-mnist/blob/master/pipeline.py , and it clones the code before running the pipeline.

The other way would be git cloning with Dockerfile (although the image would take forever to build).

What are other options?


回答1:


To kickstart KFP development using python, try the following tutorial: Data passing in python components

it clones the code before running the pipeline The other way would be git cloning with Dockerfile (although the image would take forever to build)

Ideally, the files should be inside the container image (the Dockerfile method). This ensures maximum reproducibility.

For not very complex python scripts, the Lightweight python component feature allows you to create component from a python function. In this case the script code is store in the component command-line, so you do not need to upload the code anywhere.

Putting scripts somewhere remote (e.g. cloud storage or website) is possible, but can reduce reliability and reproducibility.

P.S.

although the image would take forever to build

It shouldn't. The first time it might be slow due to having to pull the base image, but after that it should be fast since only the new layers are being pushed. (This requires choosing a good base image that has all dependencies installed, so your Dockerfile only adds your scripts).



来源:https://stackoverflow.com/questions/64838474/dsl-containerop-with-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!