问题
data2 = pd.DataFrame(data1['kwh'])
data2
kwh
date
2012-04-12 14:56:50 1.256400
2012-04-12 15:11:55 1.430750
2012-04-12 15:27:01 1.369910
2012-04-12 15:42:06 1.359350
2012-04-12 15:57:10 1.305680
2012-04-12 16:12:10 1.287750
2012-04-12 16:27:14 1.245970
2012-04-12 16:42:19 1.282280
2012-04-12 16:57:24 1.365710
2012-04-12 17:12:28 1.320130
2012-04-12 17:27:33 1.354890
2012-04-12 17:42:37 1.343680
2012-04-12 17:57:41 1.314220
2012-04-12 18:12:44 1.311970
2012-04-12 18:27:46 1.338980
2012-04-12 18:42:51 1.357370
2012-04-12 18:57:54 1.328700
2012-04-12 19:12:58 1.308200
2012-04-12 19:28:01 1.341770
2012-04-12 19:43:04 1.278350
2012-04-12 19:58:07 1.253170
2012-04-12 20:13:10 1.420670
2012-04-12 20:28:15 1.292740
2012-04-12 20:43:15 1.322840
2012-04-12 20:58:18 1.247410
2012-04-12 21:13:20 0.568352
2012-04-12 21:28:22 0.317865
2012-04-12 21:43:24 0.233603
2012-04-12 21:58:27 0.229524
2012-04-12 22:13:29 0.236929
2012-04-12 22:28:34 0.233806
2012-04-12 22:43:38 0.235618
2012-04-12 22:58:43 0.229858
2012-04-12 23:13:43 0.235132
2012-04-12 23:28:46 0.231863
2012-04-12 23:43:55 0.237794
2012-04-12 23:59:00 0.229634
2012-04-13 00:14:02 0.234484
2012-04-13 00:29:05 0.234189
2012-04-13 00:44:09 0.237213
2012-04-13 00:59:09 0.230483
2012-04-13 01:14:10 0.234982
2012-04-13 01:29:11 0.237121
2012-04-13 01:44:16 0.230910
2012-04-13 01:59:22 0.238406
2012-04-13 02:14:21 0.250530
2012-04-13 02:29:24 0.283575
2012-04-13 02:44:24 0.302299
2012-04-13 02:59:25 0.322093
2012-04-13 03:14:30 0.327600
2012-04-13 03:29:31 0.324368
2012-04-13 03:44:31 0.301869
2012-04-13 03:59:42 0.322019
2012-04-13 04:14:43 0.325328
2012-04-13 04:29:43 0.306727
2012-04-13 04:44:46 0.299012
2012-04-13 04:59:47 0.303288
2012-04-13 05:14:48 0.326205
2012-04-13 05:29:49 0.344230
2012-04-13 05:44:50 0.353484
...
65701 rows × 1 columns
I have this dataframe with this index and 1 column.I want to do simple prediction using linear regression with sklearn.I'm very confused and I don't know how to set X and y(I want the x values to be the time and y values kwh...).I'm new to Python so every help is valuable.Thank you.
回答1:
The first thing you have to do is split your data into two arrays, X and y. Each element of X will be a date, and the corresponding element of y will be the associated kwh.
Once you have that, you will want to use sklearn.linear_model.LinearRegression to do the regression. The documentation is here.
As for every sklearn model, there is two step. First you must fit your data. Then, put the dates of which you want to predict the kwh in another array, X_predict, and predict the kwh using the predict method.
from sklearn.linear_model import LinearRegression
X = [] # put your dates in here
y = [] # put your kwh in here
model = LinearRegression()
model.fit(X, y)
X_predict = [] # put the dates of which you want to predict kwh here
y_predict = model.predict(X_predict)
回答2:
Predict() function takes 2 dimensional array as arguments. So, If u want to predict the value for simple linear regression, then you have to issue the prediction value within 2 dimentional array like,
model.predict([[2012-04-13 05:55:30]]);
If it is a multiple linear regression then,
model.predict([[2012-04-13 05:44:50,0.327433]])
回答3:
You can have a look at my code on Github where I am predicting temperature using the chirps of an insect cricket with Simple Linear Regression Model. I have explained the code with comments
#Import the libraries required
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
#Importing the excel data
dataset = pd.read_excel('D:\MachineLearing\Machine Learning A-Z Template Folder\Part 2 - Regression\Section 4 - Simple Linear Regression\CricketChirpsVs.Temperature.xls')
x = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 1].values
#Split the data into train and test dataset
from sklearn.cross_validation import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=1/3,random_state=42)
#Fitting Simple Linear regression data model to train data set
from sklearn.linear_model import LinearRegression
regressorObject=LinearRegression()
regressorObject.fit(x_train,y_train)
#predict the test set
y_pred_test_data=regressorObject.predict(x_test)
# Visualising the Training set results in a scatter plot
plt.scatter(x_train, y_train, color = 'red')
plt.plot(x_train, regressorObject.predict(x_train), color = 'blue')
plt.title('Cricket Chirps vs Temperature (Training set)')
plt.xlabel('Cricket Chirps (chirps/sec for the striped ground cricket) ')
plt.ylabel('Temperature (in degrees Fahrenheit)')
plt.show()
# Visualising the test set results in a scatter plot
plt.scatter(x_test, y_test, color = 'red')
plt.plot(x_train, regressorObject.predict(x_train), color = 'blue')
plt.title('Cricket Chirps vs Temperature (Test set)')
plt.xlabel('Cricket Chirps (chirps/sec for the striped ground cricket) ')
plt.ylabel('Temperature (in degrees Fahrenheit)')
plt.show()
For more information please visit
https://github.com/wins999/Cricket_Chirps_Vs_Temprature--Simple-Linear-Regression-in-Python-
回答4:
Liner Regression:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
data=pd.read_csv('Salary_Data.csv')
X=data.iloc[:,:-1].values
y=data.iloc[:,1].values
#split dataset in train and testing set
from sklearn.cross_validation import train_test_split
X_train,X_test,Y_train,Y_test=train_test_split(X,y,test_size=10,random_state=0)
from sklearn.linear_model import LinearRegression
regressor=LinearRegression()
regressor.fit(X_train,Y_train)
y_pre=regressor.predict(X_test)
回答5:
You should implement following code.
import pandas as pd
from sklearn.linear_model import LinearRegression # to build linear regression model
from sklearn.cross_validation import train_test_split # to split dataset
data2 = pd.DataFrame(data1['kwh'])
data2 = data2.reset_index() # will create new index (0 to 65700) so date column wont be an index now.
X = data2.iloc[:,0] # date column
y = data2.iloc[:,-1] # kwh column
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, train_size=0.80, random_state=20)
linearModel = LinearRegression()
linearModel.fit(Xtrain, ytrain)
ypred = model.predict(Xtest)
here ypred will give you probabilities.
来源:https://stackoverflow.com/questions/29623171/simple-prediction-using-linear-regression-with-python