How to install python dependencies for dataflow

◇◆丶佛笑我妖孽 提交于 2020-03-05 06:05:08

问题


I have a very small python dataflow package, the structure of package looks like this

.
├── __pycache__
├── pubsubtobigq.py
├── requirements.txt
└── venv

the content of requirements.txt is

protobuf==3.11.2
protobuf3-to-dict==0.1.5

I ran my pipline using this code

python -m pubsubtobigq \
  --input_topic "projects/project_name/topics/topic_name" \
  --job_name "job_name" \
  --output "gs://mybucket/wordcount/outputs" \
  --runner DataflowRunner \
  --project "project_name"  \
  --region "us-central1" \
  --temp_location "gs://mybucket/tmp/" \
  --staging_location "gs://mybucket/staging" \
  --requirements_file requirements.txt \
  --streaming True

The code which is using this library is like

from protobuf_to_dict import protobuf_to_dict

def parse_proto(message):
    dictinoary = protobuf_to_dict(message)

But this line fails saying that protobuf_to_dict is an unknown symbol. Even if i try with google built in method MessageToDict from google.protobuf.json_format i get the same error.

How can i fix this? I need to install either of these libarries

EDIT

Error message when i used MessageToDict from google.protobuf.json_format

Error processing instruction -31. Original traceback is Traceback (most recent call last): File 
"apache_beam/runners/common.py", line 813, in apache_beam.runners.common.DoFnRunner.process File 
"apache_beam/runners/common.py", line 449, in 
apache_beam.runners.common.SimpleInvoker.invoke_process File "/Users/username/repos/
dataflow-pipeline/venv/lib/python3.7/site-packages/apache_beam/transforms/core.py", line 1415, in 
wrapper = lambda x: [fn(x)] File "/Users/username/repos/dataflow-pipeline/pubsubtobigq.py", line 
16, in parse_proto NameError: name 'MessageToDict' is not defined During handling of the above 
exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/
python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 143, in _execute response 
= task() File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py", 
line 193, in lambda: self.create_worker().do_instruction(request), request) File "/usr/local/lib/
python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 291, in do_instruction 
request.instruction_id) File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/
sdk_worker.py", line 317, in process_bundle bundle_processor.process_bundle(instruction_id)) File 
"/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/bundle_processor.py", line 675, 
in process_bundle data.transform_id].process_encoded(data.data) File "/usr/local/lib/python3.7/
site-packages/apache_beam/runners/worker/bundle_processor.py", line 146, in process_encoded 
self.output(decoded_value) File "apache_beam/runners/worker/operations.py", line 258, in 
apache_beam.runners.worker.operations.Operation.output File "apache_beam/runners/worker/
operations.py", line 259, in apache_beam.runners.worker.operations.Operation.output File 
"apache_beam/runners/worker/operations.py", line 146, in 
apache_beam.runners.worker.operations.SingletonConsumerSet.receive File "apache_beam/runners/
worker/operations.py", line 596, in apache_beam.runners.worker.operations.DoOperation.process File 
"apache_beam/runners/worker/operations.py", line 597, in 
apache_beam.runners.worker.operations.DoOperation.process File "apache_beam/runners/common.py", 
line 809, in apache_beam.runners.common.DoFnRunner.receive File "apache_beam/runners/common.py", 
line 815, in apache_beam.runners.common.DoFnRunner.process File "apache_beam/runners/common.py", 
line 882, in apache_beam.runners.common.DoFnRunner._reraise_augmented File "/usr/local/lib/
python3.7/site-packages/future/utils/init.py", line 421, in raise_with_traceback raise 
exc.with_traceback(traceback) File "apache_beam/runners/common.py", line 813, in 
apache_beam.runners.common.DoFnRunner.process File "apache_beam/runners/common.py", line 449, in 
apache_beam.runners.common.SimpleInvoker.invoke_process File "/Users/username/repos/
dataflow-pipeline/venv/lib/python3.7/site-packages/apache_beam/transforms/core.py", line 1415, in 
wrapper = lambda x: [fn(x)] File 
"/Users/username/repos/dataflow-pipeline/pubsubtobigq.py", line 16, in parse_proto NameError: 
name 'MessageToDict' is not defined [while running 'generatedPtransform-23']

Error Message when i used protobuf_to_dict

Error processing instruction -32. Original traceback is Traceback (most recent call last): File 
"apache_beam/runners/common.py", line 813, in apache_beam.runners.common.DoFnRunner.process File 
"apache_beam/runners/common.py", line 449, in 
apache_beam.runners.common.SimpleInvoker.invoke_process File "/Users/username/repos/
dataflow-pipeline/venv/lib/python3.7/site-packages/apache_beam/transforms/core.py", line 1415, in 
wrapper = lambda x: [fn(x)] File "/Users/username/repos/dataflow-pipeline/pubsubtobigq.py", line 
21, in parse_proto NameError: name 'protobuf_to_dict' is not defined During handling of the above 
exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/
python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 143, in _execute response 
= task() File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py", 
line 193, in lambda: self.create_worker().do_instruction(request), request) File "/usr/local/lib/
python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 291, in do_instruction 
request.instruction_id) File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/
sdk_worker.py", line 317, in process_bundle bundle_processor.process_bundle(instruction_id)) File 
"/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/bundle_processor.py", line 675, 
in process_bundle data.transform_id].process_encoded(data.data) File "/usr/local/lib/python3.7/
site-packages/apache_beam/runners/worker/bundle_processor.py", line 146, in process_encoded 
self.output(decoded_value) File "apache_beam/runners/worker/operations.py", line 258, in 
apache_beam.runners.worker.operations.Operation.output File "apache_beam/runners/worker/
operations.py", line 259, in apache_beam.runners.worker.operations.Operation.output File 
"apache_beam/runners/worker/operations.py", line 146, in 
apache_beam.runners.worker.operations.SingletonConsumerSet.receive File "apache_beam/runners/
worker/operations.py", line 596, in apache_beam.runners.worker.operations.DoOperation.process File 
"apache_beam/runners/worker/operations.py", line 597, in 
apache_beam.runners.worker.operations.DoOperation.process File "apache_beam/runners/common.py", 
line 809, in apache_beam.runners.common.DoFnRunner.receive File "apache_beam/runners/common.py", 
line 815, in apache_beam.runners.common.DoFnRunner.process File "apache_beam/runners/common.py", 
line 882, in apache_beam.runners.common.DoFnRunner._reraise_augmented File "/usr/local/lib/
python3.7/site-packages/future/utils/init.py", line 421, in raise_with_traceback raise 
exc.with_traceback(traceback) File "apache_beam/runners/common.py", line 813, in 
apache_beam.runners.common.DoFnRunner.process File "apache_beam/runners/common.py", line 449, in 
apache_beam.runners.common.SimpleInvoker.invoke_process File "/Users/username/repos/
dataflow-pipeline/venv/lib/python3.7/site-packages/apache_beam/transforms/core.py", line 1415, in 
wrapper = lambda x: [fn(x)] File "/Users/username/repos/dataflow-pipeline/pubsubtobigq.py", line 
21, in parse_proto NameError: name 'protobuf_to_dict' is not defined [while running 
'generatedPtransform-22']

来源:https://stackoverflow.com/questions/59993785/how-to-install-python-dependencies-for-dataflow

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!