I\'m trying to run a multiple OLS regression using statsmodels and a pandas dataframe. There are missing values in different columns for different rows, and I keep getting the e
The answer from jseabold works very well, but it may be not enough if you the want to do some computation on the predicted values and true values, e.g. if you want to use the function mean_squared_error
. In that case, it may be better to get definitely rid of NaN
df = pd.read_csv('cl_030314.csv')
df_cleaned = df.dropna()
results = sm.ols(formula = "da ~ cfo + rm_proxy + cpi + year", data=df_cleaned).fit()