问题
I'm using previous demand to predict future demand, using 3 variables
, but whenever I run the code my Y axis
shows error
If I use only one variable on the Y axis
separately it has no error.
Example:
demandaY = bike_data[['cnt']]
n_steps = 20
for time_step in range(1, n_steps+1):
demandaY['cnt'+str(time_step)] = demandaY[['cnt']].shift(-time_step).values
y = demandaY.iloc[:, 1:].values
y = np.reshape(y, (y.shape[0], n_steps, 1))
DATASET
SCRIPT
features = ['cnt','temp','hum']
demanda = bike_data[features]
n_steps = 20
for var_col in features:
for time_step in range(1, n_steps+1):
demanda[var_col+str(time_step)] = demanda[[var_col]].shift(-time_step).values
demanda.dropna(inplace=True)
demanda.head()
n_var = len(features)
columns = list(filter(lambda col: not(col.endswith("%d" % n_steps)), demanda.columns))
X = demanda[columns].iloc[:, :(n_steps*n_var)].values
X = np.reshape(X, (X.shape[0], n_steps, n_var))
y = demanda.iloc[:, 0].values
y = np.reshape(y, (y.shape[0], n_steps, 1))
OUTPUT
ValueError: cannot reshape array of size 17379 into shape (17379,20,1)
GitHub: repository
回答1:
Not clear if the OP still wants the answer but I will post the answer I linked in the comment with a few modifications.
Timeseries datasets can be of different types, lets consider a dataset which has X
as features and Y
as labels. Depending on the problem Y
might be a sample from X
shifted in time or can also be another target variable you want to predict.
def create_dataset(X,Y, look_back=10, label_lag = -1, stride = 1):
dataX, dataY = [], []
for i in range(0,(len(X)-look_back + 1),stride):
a = X[i:(i+look_back)]
dataX.append(a)
b = Y[i + look_back + label_lag]
dataY.append(b)
return np.array(dataX), np.array(dataY)
print(features.values.shape,labels.shape)
#(619,4), (619,1)
x,y = create_dataset(X=features.values,Y=labels.values,look_back=10,stride=1)
(x.shape,y.shape)
#(610, 10, 4), (610, 1)
Use of other parameters :
label_lag
: ifX
samples are at timet
,Y
samples will be at timet+label_lag
. The default value will put bothX
andY
at same indext
.
the indices of 1st sample of X
and Y
:
if label_lag is -1:
np.where(x[1,-1]==features.values)[0],np.where(y[1] == labels.values)[0]
#(10,10,10,10), (10)
if label_lag is 0:
np.where(x[1,-1]==features.values)[0],np.where(y[1] == labels.values)[0]
#(10,10,10,10), (11)
look_back
: this is the number of samples of past history of your dataset from your current timestept
. look_back of 10 means there will be samples fromt-10 to t
in one single sample.stride
: the index gap between two consecutive samples. Whenstride=2
, If 1st sample ofX
has rows from index0 to 10
then 2nd sample will have rows from the index2 to 12
.
Furthermore, you can also have a lookback in Y
depending on your current problem and Y
can also be multi-dimensional. In that case the change is only this b=Y[i:(i+look_back+label_lag)]
.
The same functionality can be achieved by TimeseriesGenerator
from keras
.
TimeseriesGenerator(features.values,labels.values,length=10,batch_size=64,stride=1)
where length
is same as look_back
. By default there is a gap in features
and labels
by 1, i.e. a sample in X
will be from t-10 to t
and corresponding sample in Y
will be at index t+1
. If you want both at same indices just shift
the labels by one before passing in the generator.
来源:https://stackoverflow.com/questions/58721591/recurrent-neural-networks-for-time-series-with-multiple-variables-tensorflow