问题
What are the options to download .py files into the execution environment?
In this example:
class Preprocess(dsl.ContainerOp):
def __init__(self, name, bucket, cutoff_year):
super(Preprocess, self).__init__(
name=name,
# image needs to be a compile-time string
image='gcr.io/<project>/<image-name>/cpu:v1',
command=['python3', 'run_preprocess.py'],
arguments=[
'--bucket', bucket,
'--cutoff_year', cutoff_year,
'--kfp'
],
file_outputs={'blob-path': '/blob_path.txt'}
)
run_preprocess.py file is being called from CLI.
The question is: how to get that file in there?
I have seen this interesting example: https://github.com/benjamintanweihao/kubeflow-mnist/blob/master/pipeline.py , and it clones the code before running the pipeline.
The other way would be git cloning with Dockerfile (although the image would take forever to build).
What are other options?
回答1:
To kickstart KFP development using python, try the following tutorial: Data passing in python components
it clones the code before running the pipeline The other way would be git cloning with Dockerfile (although the image would take forever to build)
Ideally, the files should be inside the container image (the Dockerfile method). This ensures maximum reproducibility.
For not very complex python scripts, the Lightweight python component feature allows you to create component from a python function. In this case the script code is store in the component command-line, so you do not need to upload the code anywhere.
Putting scripts somewhere remote (e.g. cloud storage or website) is possible, but can reduce reliability and reproducibility.
P.S.
although the image would take forever to build
It shouldn't. The first time it might be slow due to having to pull the base image, but after that it should be fast since only the new layers are being pushed. (This requires choosing a good base image that has all dependencies installed, so your Dockerfile only adds your scripts).
来源:https://stackoverflow.com/questions/64838474/dsl-containerop-with-python