I am trying to use a linear regression on a group by pandas python dataframe:
This is the dataframe df:
group date value
A 01-02-201
This might be a late response but I post the answer anyway should someone encounters the same problem. Actually, everything that was shown was correct except for the regression block. Here are the two problems with the implementation:
Please note that the model.fit(X, y)
gets an input X{array-like, sparse matrix} of shape (n_samples, n_features) for X. So both inputs for model.fit(X, y)
should be 2D. You can easily convert the 1D series to 2D by the reshape(-1, 1)
command.
The second problem is the regression fitting process itself:
y and X are not the input of model = LinearRegression(y, X)
but rather the input of `model.fit(X, y)'.
Here is the modification to the regression block:
for group in df_group.groups.keys():
df= df_group.get_group(group)
X = np.array(df[['date_delta']]).reshape(-1, 1) # note that series does not have reshape function, thus you need to convert to array
y = np.array(df.value).reshape(-1, 1)
model = LinearRegression() # <--- this does not accept (X, y)
results = model.fit(X, y)
print results.summary()