How to build hybrid model to find optimal discount of products?

前端 未结 1 1083
花落未央
花落未央 2021-02-04 09:44

I need to find optimal discount for each product (in e.g. A, B, C) so that I can maximize total sales. I have existing Random Forest models for each product that map discount an

1条回答
  •  攒了一身酷
    2021-02-04 10:18

    you can find a complete solution below !

    The fundamental differences with your approach are the following :

    1. Since the Random Forest model takes as input the season feature, optimal discounts must be computed for every season.
    2. Inspecting the documentation of pyswarm, the con function yields an output that must comply with con(x) >= 0.0. The correct constraint is therefore 20 - sum(...) and not the other way around. In addition, the units and mrp variable were not given ; I just assumed a value of 1, you might want to change those values.

    Additional modifications to your original code include :

    1. Preprocessing and pipeline wrappers of sklearn in order to simplify the preprocessing steps.
    2. Optimal parameters are stored in an output .xlsx file.
    3. The maxiter parameter of the PSO has been set to 5 to speed-up debugging, you might want to set its value to another one (default = 100).

    The code is therefore :

    import pandas as pd 
    from sklearn.pipeline import Pipeline
    from sklearn.preprocessing import OneHotEncoder, StandardScaler
    from sklearn.compose import ColumnTransformer
    from sklearn.ensemble import RandomForestRegressor 
    from sklearn.base import clone
    
    # ====================== RF TRAINING ======================
    # Preprocessing
    def build_sample(season, discount_percentage):
        return pd.DataFrame({
            'season': [season],
            'discount_percentage': [discount_percentage]
        })
    
    columns_to_encode = ["season"]
    columns_to_scale = ["discount_percentage"]
    encoder = OneHotEncoder()
    scaler = StandardScaler()
    preproc = ColumnTransformer(
        transformers=[
            ("encoder", Pipeline([("OneHotEncoder", encoder)]), columns_to_encode),
            ("scaler", Pipeline([("StandardScaler", scaler)]), columns_to_scale)
        ]
    )
    
    # Model
    myRFClassifier = RandomForestRegressor(
        n_estimators = 500,
        random_state = 12,
        bootstrap = True,
        oob_score = True)
    
    pipeline_list = [
        ('preproc', preproc),
        ('clf', myRFClassifier)
    ]
    
    pipe = Pipeline(pipeline_list)
    
    # Dataset
    df_tot = pd.read_excel("so_data.xlsx")
    df_dict = {
        product: df_tot[df_tot['product'] == product].drop(columns=['product']) for product in pd.unique(df_tot['product'])
    }
    
    # Fit
    print("Training ...")
    pipe_dict = {
        product: clone(pipe) for product in df_dict.keys()
    }
    
    for product, df in df_dict.items():
        X = df.drop(columns=["sales_uplift_norm"])
        y = df["sales_uplift_norm"]
        pipe_dict[product].fit(X,y)
    
    # ====================== OPTIMIZATION ====================== 
    from pyswarm import pso
    # Parameter of PSO
    maxiter = 5
    
    n_product = len(pipe_dict.keys())
    
    # Constraints
    budget = 20
    units  = [1, 1, 1]
    mrp    = [1, 1, 1]
    
    lb = [0.0, 0.0, 0.0]
    ub = [0.3, 0.4, 0.4]
    
    # Must always remain >= 0
    def con(x):
        s = 0
        for i in range(n_product):
            s += units[i] * mrp[i] * x[i]
    
        return budget - s
    
    print("Optimization ...")
    
    # Save optimal discounts for every product and every season
    df_opti = pd.DataFrame(data=None, columns=df_tot.columns)
    for season in pd.unique(df_tot['season']):
    
        # Objective function to minimize
        def obj(x):
            s = 0
            for i, product in enumerate(pipe_dict.keys()):
                s += pipe_dict[product].predict(build_sample(season, x[i]))
            
            return -s
    
        # PSO
        xopt, fopt = pso(obj, lb, ub, f_ieqcons=con, maxiter=maxiter)
        print("Season: {}\t xopt: {}".format(season, xopt))
    
        # Store result
        df_opti = pd.concat([
            df_opti,
            pd.DataFrame({
                'product': list(pipe_dict.keys()),
                'season': [season] * n_product,
                'discount_percentage': xopt,
                'sales_uplift_norm': [
                    pipe_dict[product].predict(build_sample(season, xopt[i]))[0] for i, product in enumerate(pipe_dict.keys())
                ]
            })
        ])
    
    # Save result
    df_opti = df_opti.reset_index().drop(columns=['index'])
    df_opti.to_excel("so_result.xlsx")
    print("Summary")
    print(df_opti)
    

    It gives :

    Training ...
    Optimization ...
    Stopping search: maximum iterations reached --> 5
    Season: summer   xopt: [0.1941521  0.11233673 0.36548761]
    Stopping search: maximum iterations reached --> 5
    Season: winter   xopt: [0.18670604 0.37829516 0.21857777]
    Stopping search: maximum iterations reached --> 5
    Season: monsoon  xopt: [0.14898102 0.39847885 0.18889792]
    Summary
      product   season  discount_percentage  sales_uplift_norm
    0       A   summer             0.194152           0.175973
    1       B   summer             0.112337           0.229735
    2       C   summer             0.365488           0.374510
    3       A   winter             0.186706          -0.028205
    4       B   winter             0.378295           0.266675
    5       C   winter             0.218578           0.146012
    6       A  monsoon             0.148981           0.199073
    7       B  monsoon             0.398479           0.307632
    8       C  monsoon             0.188898           0.210134
    

    0 讨论(0)
提交回复
热议问题