Errors running Sagemaker Batch Transformation with LDA model

问题

I've successfully trained a LDA model with sagemaker, I've been able to set up an Inference API but it has a limit of how many records I can query at a time.

I need to get predictions for a large file and have been trying to use Batch Transformation however am running against roadblock.

My input date is in application/x-recordio-protobuf content type, code is as follows:

# Initialize the transformer object
transformer =sagemaker.transformer.Transformer(
    base_transform_job_name='Batch-Transform',
    model_name=model_name,
    instance_count=1,
    instance_type='ml.c4.xlarge',
    output_path=output_location,
    max_payload=20,
    strategy='MultiRecord'
    )
# Start a transform job
transformer.transform(input_location, content_type='application/x-recordio-protobuf',split_type="RecordIO")
# Then wait until the transform job has completed
transformer.wait()

# Fetch validation result 
s3_client.download_file(bucket, 'topic_model_batch_transform/output/batch_tansform_part0.pbr.out', 'batch_tansform-result')
with open('batch_tansform-result') as f:
    results = f.readlines()   
print("Sample transform result: {}".format(results[0]))

I have chunked by input file into 10 files each around 19MB in size. I am attempting at first to run on a single chunk, therefore 19MB in total. I have tried changing strategy, trying SingleRecord. I have also tried different split_types, also trying None and "Line".

I've read the documentation but its not clear what else I should try, also the error messages are very unclear.

2019-04-02T15:49:47.617:[sagemaker logs]: MaxConcurrentTransforms=1, MaxPayloadInMB=20, BatchStrategy=MULTI_RECORD
#011at java.lang.Thread.run(Thread.java:748)2019-04-02T15:49:48.035:[sagemaker logs]: du-sagemaker/data/batch_transform/batch_tansform_part0.pbr: Bad HTTP status returned from invoke: 413
2019-04-02T15:49:48.036:[sagemaker logs]: du-sagemaker/data/batch_transform/batch_tansform_part0.pbr:
2019-04-02T15:49:48.036:[sagemaker logs]: du-sagemaker/data/batch_transform/batch_tansform_part0.pbr: Message:
2019-04-02T15:49:48.036:[sagemaker logs]: du-sagemaker/data/batch_transform/batch_tansform_part0.pbr: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
2019-04-02T15:49:48.036:[sagemaker logs]: du-sagemaker/data/batch_transform/batch_tansform_part0.pbr: <title>413 Request Entity Too Large</title>
2019-04-02T15:49:48.036:[sagemaker logs]: du-sagemaker/data/batch_transform/batch_tansform_part0.pbr: <h1>Request Entity Too Large</h1>
2019-04-02T15:49:48.036:[sagemaker logs]: du-sagemaker/data/batch_transform/batch_tansform_part0.pbr: <p>The data value transmitted exceeds the capacity limit.</p>

The above is the last one I got with the above configuration, before that I was also getting a 400 HTTP error code.

Any help or pointers would be greatly appreciated! Thank you

回答1:

While the Batch Transform platform supports flexible payload limits (via MaxPayloadInMB), many algorithms set more strict internal limits. This is true for the SageMaker built-in LDA algorithm which rejects "large" requests according to its internal configuration.

The error you see in the log says exactly this: the Batch Transform client attempted to send a request as large as 20MB, but the LDA algorithm server rejected the request with error code 413 (Request Entity Too Large).

When using a SageMaker built-in algorithm container, or any container that is not your own, we recommend leaving the parameter MaxPayloadInMB unset in your CreateTransformJob request. This will prompt the platform to choose the algorithm's default execution parameters, which you will see printed in your log like so:

[sagemaker logs]: MaxConcurrentTransforms=1, MaxPayloadInMB=${DEFAULT_MAX_PAYLOAD_IN_MB}, BatchStrategy=MultiRecord

For more insight on how these "execution parameters" are resolved, see the "order of precedence" documented here.

Aside from controlling payload size, your other transform job parameter choices (SplitType=RecordIO and BatchStrategy=MultiRecord) look correct for passing RecordIO-Protobuf data.

回答2:

I managed to resolve the issue, it seemed the maxpayload I was using was too high. I set MaxPayloadInMB=1 and it now runs like a dream

来源：https://stackoverflow.com/questions/55479366/errors-running-sagemaker-batch-transformation-with-lda-model

标签

python

amazon-web-services

amazon-sagemaker

protobuf-c