问题
After creating SKLearn() instance and using HyperparamaterTuner with a few hyperparameter ranges, I get the best estimator. When I try to deploy() the estimator, it gives an error in the log. Exactly same error happens when I create transformer and call transform on it(). Doesn't deploy and doesn't transform. What could be the problem and at least how could I possibly narrow down the problem?
I have no idea how to even begin to figure this out. Googling didn't help. Nothing comes up.
Creating SKLearn instance:
sklearn = SKLearn(
entry_point=script_path,
train_instance_type="ml.c4.xlarge",
role=role,
sagemaker_session=session,
hyperparameters={'model': 'rfc'})
Putting tuner to work:
tuner = HyperparameterTuner(estimator = sklearn,
objective_metric_name = objective_metric_name,
objective_type = 'Minimize',
metric_definitions = metric_definitions,
hyperparameter_ranges = hyperparameters,
max_jobs = 3, # 9,
max_parallel_jobs = 4)
tuner.fit({'train': s3_input_train})
tuner.wait()
best_training_job = tuner.best_training_job()
the_best_estimator = sagemaker.estimator.Estimator.attach(best_training_job)
This gives a valid best training job. Everything seems great.
Here is where the problem manifests:
predictor = the_best_estimator.deploy(initial_instance_count=1, instance_type="ml.m4.xlarge")
or the following (triggers exactly same problem):
rfc_transformer = the_best_estimator.transformer(1, instance_type="ml.m4.xlarge")
rfc_transformer.transform(test_location)
rfc_transformer.wait()
Here is the log with the error message (it reiterates the same error many times while trying to deploy or transform; here is the beginning of the log):
................[2019-09-22 09:17:48 +0000] [17] [INFO] Starting gunicorn 19.9.0
[2019-09-22 09:17:48 +0000] [17] [INFO] Listening at: unix:/tmp/gunicorn.sock (17)
[2019-09-22 09:17:48 +0000] [17] [INFO] Using worker: gevent
[2019-09-22 09:17:48 +0000] [24] [INFO] Booting worker with pid: 24
[2019-09-22 09:17:48 +0000] [25] [INFO] Booting worker with pid: 25
[2019-09-22 09:17:48 +0000] [26] [INFO] Booting worker with pid: 26
[2019-09-22 09:17:48 +0000] [30] [INFO] Booting worker with pid: 30
2019-09-22 09:18:15,061 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)
2019-09-22 09:18:15,062 INFO - sagemaker_sklearn_container.serving - Encountered an unexpected error.
[2019-09-22 09:18:15 +0000] [24] [ERROR] Error handling request /ping
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/gunicorn/workers/base_async.py", line 56, in handle self.handle_request(listener_name, req, client, addr)
File "/usr/local/lib/python3.5/dist-packages/gunicorn/workers/ggevent.py", line 160, in handle_request addr)
File "/usr/local/lib/python3.5/dist-packages/gunicorn/workers/base_async.py", line 107, in handle_request respiter = self.wsgi(environ, resp.start_response)
File "/usr/local/lib/python3.5/dist-packages/sagemaker_sklearn_container/serving.py", line 119, in main user_module_transformer = import_module(serving_env.module_name, serving_env.module_dir)
File "/usr/local/lib/python3.5/dist-packages/sagemaker_sklearn_container/serving.py", line 97, in import_module user_module = importlib.import_module(module_name)
File "/usr/lib/python3.5/importlib/init.py", line 117, in import_module if name.startswith('.'):
AttributeError: 'NoneType' object has no attribute 'startswith'
169.254.255.130 - - [22/Sep/2019:09:18:15 +0000] "GET /ping HTTP/1.1" 500 141 "-" "Go-http-client/1.1"
2019-09-22 09:18:15,178 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)
2019-09-22 09:18:15,179 INFO - sagemaker_sklearn_container.serving - Encountered an unexpected error.
[2019-09-22 09:18:15 +0000] [30] [ERROR] Error handling request /ping
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/gunicorn/workers/base_async.py", line 56, in handle self.handle_request(listener_name, req, client, addr)
File "/usr/local/lib/python3.5/dist-packages/gunicorn/workers/ggevent.py", line 160, in handle_request addr)
File "/usr/local/lib/python3.5/dist-packages/gunicorn/workers/base_async.py", line 107, in handle_request respiter = self.wsgi(environ, resp.start_response)
File "/usr/local/lib/python3.5/dist-packages/sagemaker_sklearn_container/serving.py", line 119, in main user_module_transformer = import_module(serving_env.module_name, serving_env.module_dir)
File "/usr/local/lib/python3.5/dist-packages/sagemaker_sklearn_container/serving.py", line 97, in import_module user_module = importlib.import_module(module_name)
File "/usr/lib/python3.5/importlib/init.py", line 117, in import_module if name.startswith('.'):
回答1:
Double check you have the necessary environment variables set. I ran into this issue when I didn't set the environment variables SAGEMAKER_DEFAULT_INVOCATIONS_ACCEPT
, SAGEMAKER_PROGRAM
, and SAGEMAKER_SUBMIT_DIRECTORY
. Check a working base model to see what environment variables need to be set.
来源:https://stackoverflow.com/questions/58050712/problem-deploying-the-best-estimator-gotten-with-sagemaker-estimator-estimator