问题
this is my problem: I have to run a Sagemaker processing job using custom code written in PySpark. I've used the Sagemaker SDK by running these commands:
spark_processor = sagemaker.spark.processing.PySparkProcessor(
base_job_name="spark-preprocessor",
framework_version="2.4",
role=role_arn,
instance_count=2,
instance_type="ml.m5.xlarge",
max_runtime_in_seconds=1800,
)
spark_processor.run(
submit_app="processing.py",
arguments=['s3_input_bucket', bucket_name,
's3_input_file_path', file_path
]
)
Now I have to automate the workflow by using Step Functions. For this purpose, I've written a lambda function to do that but I receive the following error:
{
"errorMessage": "Unable to import module 'lambda_function': No module named 'sagemaker'",
"errorType": "Runtime.ImportModuleError"
}
This is my lambda function:
import sagemaker
def lambda_handler(event, context):
spark_processor = sagemaker.spark.processing.PySparkProcessor(
base_job_name="spark-preprocessor",
framework_version="2.4",
role=role_arn,
instance_count=2,
instance_type="ml.m5.xlarge",
max_runtime_in_seconds=1800,
)
spark_processor.run(
submit_app="processing.py",
arguments=['s3_input_bucket', event["bucket_name"],
's3_input_file_path', event["file_path"]
]
)
My question is: How can I create a step in my state machine for running a PySpark code using Sagemaker processing?
Thank you
来源:https://stackoverflow.com/questions/65041847/sagemaker-processing-job-with-pyspark-and-step-functions