Sagemaker processing job with PySpark and Step Functions

问题

this is my problem: I have to run a Sagemaker processing job using custom code written in PySpark. I've used the Sagemaker SDK by running these commands:

spark_processor = sagemaker.spark.processing.PySparkProcessor(
        base_job_name="spark-preprocessor",
        framework_version="2.4",
        role=role_arn,
        instance_count=2,
        instance_type="ml.m5.xlarge",
        max_runtime_in_seconds=1800,
    )

    spark_processor.run(
        submit_app="processing.py",
        arguments=['s3_input_bucket', bucket_name,
                   's3_input_file_path', file_path
                   ]
    )

Now I have to automate the workflow by using Step Functions. For this purpose, I've written a lambda function to do that but I receive the following error:

{
  "errorMessage": "Unable to import module 'lambda_function': No module named 'sagemaker'",
  "errorType": "Runtime.ImportModuleError"
}

This is my lambda function:

import sagemaker

def lambda_handler(event, context):
    spark_processor = sagemaker.spark.processing.PySparkProcessor(
        base_job_name="spark-preprocessor",
        framework_version="2.4",
        role=role_arn,
        instance_count=2,
        instance_type="ml.m5.xlarge",
        max_runtime_in_seconds=1800,
    )

    spark_processor.run(
        submit_app="processing.py",
        arguments=['s3_input_bucket', event["bucket_name"],
                   's3_input_file_path', event["file_path"]
                   ]
    )

My question is: How can I create a step in my state machine for running a PySpark code using Sagemaker processing?

Thank you

来源：https://stackoverflow.com/questions/65041847/sagemaker-processing-job-with-pyspark-and-step-functions

标签

python

amazon-web-services

pyspark

amazon-sagemaker

aws-step-functions