How to save sklearn model on s3 using joblib.dump?

前端 未结 3 1600
暖寄归人
暖寄归人 2021-02-15 16:09

I have a sklearn model and I want to save the pickle file on my s3 bucket using joblib.dump

I used joblib.dump(model, \'model.pkl\') to save the model local

相关标签:
3条回答
  • 2021-02-15 16:20

    Here's a way that worked for me. Pretty straight forward and easy. I'm using joblib (it's better for storing large sklearn models) but you could use pickle too.
    Also, I'm using temporary files for transferring to/from S3. But if you want, you could store the file in a more permanent location.

    import tempfile
    import boto3
    import joblib
    
    bucket_name = "my-bucket"
    key = "model.pkl"
    
    # WRITE
    with tempfile.TemporaryFile() as fp:
        joblib.dump(model, fp)
        fp.seek(0)
        s3_resource.put_object(Body=fp.read(), Bucket=bucket_name, Key=key)
    
    # READ
    with tempfile.TemporaryFile() as fp:
        s3_resource.download_fileobj(Fileobj=fp, Bucket=bucket_name, Key=key)
        fp.seek(0)
        model = joblib.load(fp)
    
    # DELETE
    s3_resource.delete_object(Bucket=bucket_name, Key=key)
    
    0 讨论(0)
  • 2021-02-15 16:26

    Use following code to dump your model to s3 location in .pkl or .sav format:

    import tempfile
    import boto3
    s3 = boto3.resource('s3')
    
    # you can dump it in .sav or .pkl format 
    location = 's3://bucket_name/folder_name/'
    model_filename = 'model.sav'  # use any extension you want (.pkl or .sav)
    OutputFile = location + model_filename
    
    # WRITE
    with tempfile.TemporaryFile() as fp:
        joblib.dump(scikit_learn_model, fp)
        fp.seek(0)
        # use bucket_name and OutputFile - s3 location path in string format.
        s3.Bucket('bucket_name').put_object(Key= OutputFile, Body=fp.read())
    
    0 讨论(0)
  • 2021-02-15 16:29

    Just correcting Sayali Sonawane's answer:

    import tempfile
    import boto3
    s3 = boto3.resource('s3')
    
    # you can dump it in .sav or .pkl format 
    location = 'folder_name/' # THIS is the change to make the code work
    model_filename = 'model.sav'  # use any extension you want (.pkl or .sav)
    OutputFile = location + model_filename
    
    # WRITE
    with tempfile.TemporaryFile() as fp:
        joblib.dump(scikit_learn_model, fp)
        fp.seek(0)
        # use bucket_name and OutputFile - s3 location path in string format.
        s3.Bucket('bucket_name').put_object(Key= OutputFile, Body=fp.read())
    
    0 讨论(0)
提交回复
热议问题