Is it possible to predict in sagemaker without using s3

问题

I have a .pkl which I would like to put into production. I would like to do a daily query of my SQL server and do a prediction on about 1000 rows. The documentation implies I have to load the daily data into s3. Is there a way around this? It should be able to fit in memory no problem.

The answer to " is there some kind of persistent local storage in aws sagemaker model training? " says that "The notebook instance is coming with a local EBS (5GB) that you can use to copy some data into it and run the fast development iterations without copying the data every time from S3." The 5GB could be enough but I am not sure you can actually run from a notebook in this manner. If I set up a VPN could I just query using pyodbc?

Is there sagemaker integration with AWS Lambda? That in combination with a docker container would suit my needs.

回答1:

While you need to to specify a s3 "folder" as input, this folder can contain only a dummy file. Also if you bring your own docker container for training like in this example, you can do pretty much everthing in it. So you could do your daily query inside your docker container, because they have access to the internet.

Also inside this container you have access to all the other aws services. Your access is defined by the role you're passing to your training job.

回答2:

You can create an endpoint on Sagemaker to host your pickled model and make predictions by invoking the endpoint using AWS Lambda. S3 bucket is not necessary for making real time predictions. Batch Transform is non-real time inference which requires S3 bucket. For making predictions up to 1000 rows, you can use real time inference inside a lambda function. The lambda code roughly looks like this

import sys
import logging
import rds_config
import pymysql
import boto3
import json

#rds settings
rds_host  = "rds-instance-endpoint"
name = rds_config.db_username
password = rds_config.db_password
db_name = rds_config.db_name

# sagemaker client
sagemaker = boto3.client('sagemaker-runtime', region_name ='<your region>' )

logger = logging.getLogger()
logger.setLevel(logging.INFO)

try:
    conn = pymysql.connect(rds_host, user=name, passwd=password, db=db_name, connect_timeout=5)
except:
    logger.error("ERROR: Unexpected error: Could not connect to MySql instance.")
    sys.exit()

logger.info("SUCCESS: Connection to RDS mysql instance succeeded")
def handler(event, context):
    """
    This function fetches content from mysql RDS instance
    """

    item_count = 0

    with conn.cursor() as cur:
        cur.execute("select * from table_name")
        for row in cur:
            # format rows to match with prediction payload
            item_count += 1
            new_row = ','.join(row)
            response = sagemaker.invoke_endpoint(
                EndpointName='ServiceEndpoint',
                Body=new_row, 
                ContentType='text/csv'
            )
            prediction = json.loads(response['Body'].read().decode())
            print(result)
            # store predictions somewhere if needed

    return "Made predictioncs on %d items from RDS MySQL table" %(item_count)

回答3:

The S3 data is not required.

Here's the link to SageMaker documentation page: https://docs.aws.amazon.com/sagemaker/latest/dg/API_ContainerDefinition.html#SageMaker-Type-ContainerDefinition-ModelDataUrl

A prediction can be made via in-line text blob or a data file (binary, plain text, csv, json, etc.)

来源：https://stackoverflow.com/questions/51391639/is-it-possible-to-predict-in-sagemaker-without-using-s3

标签

amazon-web-services

amazon-s3

aws-lambda

amazon-sagemaker