How to trigger a dataflow with a cloud function? (Python SDK)

前端未结

关注

 2  586

I have a cloud function that is triggered by cloud Pub/Sub. I want the same function trigger dataflow using Python SDK. Here is my code:

import base64
def h


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  北恋        
                
              
                            
                2021-01-07 11:50
              
            
            
                                                                       
You have to embed your pipeline python code with your function. When your function is called, you simply call the pipeline python main function which executes the pipeline in your file.

If you developed and tried your pipeline in Cloud Shell and you already ran it in Dataflow pipeline, your code should have this structure:

def run(argv=None, save_main_session=True):
  # Parse argument
  # Set options
  # Start Pipeline in p variable
  # Perform your transform in Pipeline
  # Run your Pipeline
  result = p.run()
  # Wait the end of the pipeline
  result.wait_until_finish()


Thus, call this function with the correct argument especially the runner=DataflowRunner to allow the python code to load the pipeline in Dataflow service.

Delete at the end the result.wait_until_finish() because your function won't live all the dataflow process long.

You can also use template if you want.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  名媛妹妹        
                
              
                            
                2021-01-07 12:02
              
            
            
                                                                       
You can use Cloud Dataflow templates to launch your job. You will need to code the following steps:


Retrieve credentials
Generate Dataflow service instance
Get GCP PROJECT_ID
Generate template body
Execute template


Here is an example using your base code (feel free to split into multiple methods to reduce code inside hello_pubsub method).

from googleapiclient.discovery import build
import base64
import google.auth
import os

def hello_pubsub(event, context):   
    if 'data' in event:
        message = base64.b64decode(event['data']).decode('utf-8')
    else:
        message = 'hello world!'

    credentials, _ = google.auth.default()
    service = build('dataflow', 'v1b3', credentials=credentials)
    gcp_project = os.environ["GCLOUD_PROJECT"]

    template_path = gs://template_file_path_on_storage/
    template_body = {
        "parameters": {
            "keyA": "valueA",
            "keyB": "valueB",
        },
        "environment": {
            "envVariable": "value"
        }
    }

    request = service.projects().templates().launch(projectId=gcp_project, gcsPath=template_path, body=template_body)
    response = request.execute()

    print(response)


In template_body variable, parameters values are the arguments that will be sent to your pipeline and environment values are used by Dataflow service (serviceAccount, workers and network configuration).

LaunchTemplateParameters documentation

RuntimeEnvironment documentation
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复