How to implement dynamic parameter estimation with missing data in Gekko?

风流意气都作罢 提交于 2021-02-10 15:59:07

问题


Going back and forth through the documentation, I was able to set-up a dynamic parameter estimation in Gekko.

Here's the code, with measurement values shown below (the file is named MeasuredAlgebrProductionRate_30min_18h.csv on my system, and uses ;as separator):

import numpy as np
import matplotlib.pyplot as plt
from gekko import GEKKO

#%% Read measurement data from CSV file
t_x_q_obs = np.genfromtxt('MeasuredAlgebrProductionRate_30min_18h.csv', delimiter=';')
#t_obs, x_obs, q_obs = t_xq_obs[:,0:3]

#%% Initialize Model
m = GEKKO(remote=False)
m.time = t_x_q_obs[:,0] #np.arange(0, 18/24+1e-6, 1/2*1/24)

# Declare parameter
V_liq   = m.Param(value = 159.0)

# Declare FVs
k_1     = m.FV(value = 0.80)
k_1.STATUS = 1
f_1     = m.FV(value = 10.0)
f_1.STATUS = 1

# Diff. Variables
X       = m.Var(value = 80.0) # at t=0
Y       = m.Var(value = 80.0*0.2)

rho_1   = m.Intermediate(k_1*X)
#q_prod  = m.Intermediate(0.52*f_1*X/24)
#X       = m.CV(value = t_x_q_obs[:,1])
q_prod  = m.CV(value = t_x_q_obs[:,2])

#%% Equations
m.Equations([X.dt() == -rho_1, Y.dt() == 0, q_prod == 0.52*f_1*X/24])

m.options.IMODE = 5
m.solve(disp=False)

#%% Plot some results
plt.plot(m.time, np.array(X.value)/10, label='X')
plt.plot(t_x_q_obs[:,0], t_x_q_obs[:,2], label='q_prod Meas.')
plt.plot(m.time, q_prod.value, label='q_prod Sim.')
plt.xlabel('time')
plt.ylabel('X / q_prod')
plt.grid()
plt.legend(loc='best')
plt.show()

0.0208333333 NaN 30.8306036
0.0416666667 NaN 29.1200832
0.0625 74.866 28.7700549
0.0833333333 NaN 29.2318865
0.104166667 NaN 30.7727362
0.125 NaN 29.8743804
0.145833333 NaN 29.9923447
0.166666667 NaN 30.9169679
0.1875 NaN 28.5956184
0.208333333 NaN 27.7361632
0.229166667 NaN 26.6669496
0.25 NaN 27.17477
0.270833333 75.751 23.6270346
0.291666667 NaN 23.0646928
0.3125 NaN 23.6442113
0.333333333 NaN 23.089118
0.354166667 NaN 22.9101616
0.375 NaN 22.7453854
0.395833333 NaN 23.2182759
0.416666667 NaN 21.4901903
0.4375 NaN 21.1449899
0.458333333 NaN 20.7093537
0.479166667 NaN 20.3109086
0.5 NaN 20.6825141
0.520833333 NaN 19.199583
0.541666667 NaN 19.6173416
0.5625 NaN 19.5543139
0.583333333 NaN 20.4501879
0.604166667 NaN 18.7678061
0.625 NaN 18.4629262
0.645833333 NaN 18.3730322
0.666666667 NaN 19.5375442
0.6875 NaN 18.1975297
0.708333333 NaN 18.0370627
0.729166667 NaN 17.5734727
0.75 NaN 18.8632046

So far, so good. Suppose I also have measurements of X (second column) at some time points (first column), the rest is not available (therefore NaN). I would like to adjust k_1 and f_1, so that simulated and observed variables X and q_prod match as closely as possible.

Is this feasible with Gekko? If so, how?

Another question: Gekko throws an error if m.time has more elements than there are time points of observed variables. However, my initial values of X and Y refer to t=0, not t=0.0208333333. Hence, the commented out part after m.time =, see above. (Measurements at t=0 are not available.) Do initial conditions in Gekko refer to the first element of m.time, as they do in Matlab, or to t=0?


回答1:


If you have a missing measurement then you can include a non-numeric value such as NaN and Gekko ignores that entry in the objective function. Here is a test case with one NaN value in ym:

Nonlinear Regression with NaN Data Value

from gekko import GEKKO
import numpy as np
import matplotlib.pyplot as plt
xm = np.array([0,1,2,3,4,5])
ym = np.array([0.1,0.2,np.nan,0.5,0.8,2.0])
m = GEKKO(remote=False)
x = m.Param(value=xm,name='x')
a = m.FV()
a.STATUS=1
y = m.CV(value=ym,name='y')
y.FSTATUS=1
m.Equation(y==0.1*m.exp(a*x))
m.options.IMODE = 2
m.options.SOLVER = 1
m.solve(disp=True)
print('Optimized, a = ' + str(a.value[0]))
plt.plot(xm,ym,'bo')
plt.plot(xm,y.value,'r-')
m.open_folder()
plt.show()

When you open the run folder with m.open_folder() and look at the data file gk_model0.csv, there is the NaN in the y value column.

y,x
0.1,0
0.2,1
nan,2
0.5,3
0.8,4
2.0,5

This is IMODE=2 so it is a steady state regression problem but shows the same thing that happens with dynamic estimation problems. There is more information on the objective function with m.options.EV_TYPE=1 (default) or m.options.EV_TYPE=2 for estimation and how bad values are handled in a data file. When the measurement is a non-numeric value, that bad value is dropped from the objective function summation. Here is a version with a dynamic model:

Dynamic Regression with Fixed Initial Condition

from gekko import GEKKO
import numpy as np
import matplotlib.pyplot as plt
xm = np.array([0,1,2,3,4,5])
ym = np.array([2.0,1.5,np.nan,2.2,3.0,5.0])
m = GEKKO(remote=False)
m.time = xm
a = m.FV(lb=0.1,ub=2.0)
a.STATUS=1
y = m.CV(value=ym,name='y',fixed_initial=False)
y.FSTATUS=1
m.Equation(y.dt()==a*y)
m.options.IMODE = 5
m.options.SOLVER = 1
m.solve(disp=True)
print('Optimized, a = ' + str(a.value[0]))
plt.figure(figsize=(6,2))
plt.plot(xm,ym,'bo',label='Meas')
plt.plot(xm,y.value,'r-',label='Pred')
plt.ylabel('y')
plt.ylim([0,6])
plt.legend()
plt.show()

As you observed, you need to have the same length for m.time as for your measurement values. If you are missing values then you can include append a np.nan to the beginning of the data horizon. By default, Gekko uses the first value specified in the value property to set the initial condition. If you don't want Gekko to use that value then set fixed_initial=False for your CV.

Dynamic Regression with Free Initial Condition

y = m.CV(value=ym,name='y',fixed_initial=False)


来源:https://stackoverflow.com/questions/57598778/how-to-implement-dynamic-parameter-estimation-with-missing-data-in-gekko

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!