Dataflow/apache beam: manage custom module dependencies

后端 未结 1 583
说谎
说谎 2021-02-09 11:11

I have a .py pipeline using apache beam that import another module (.py), that is my custom module. I have a strucutre like this:

├── mymain.py
└── myothermodule         


        
相关标签:
1条回答
  • When you run your pipeline remotely, you need to make any dependencies available on the remote workers too. To do it you should put your module file in a Python package by putting it in a directory with a __init__.py file and creating a setup.py. It would look like this:

    ├── mymain.py
    ├── setup.py
    └── othermodules
        ├── __init__.py
        └── myothermodule.py
    

    And import it like this:

    from othermodules import myothermodule
    

    Then you can run you pipeline with the command line option --setup_file ./setup.py

    A minimal setup.py file would look like this:

    import setuptools
    
    setuptools.setup(packages=setuptools.find_packages())
    

    The whole setup is documented here.

    And a whole example using this can be found here.

    0 讨论(0)
提交回复
热议问题