prediction

Is it acceptable to scale target values for regressors?

可紊 提交于 2019-12-25 01:04:21
问题 I am getting very high RMSE and MAE for MLPRegressor , ForestRegression and Linear regression with only input variables scaled (30,000+) however when i scale target values aswell i get RMSE (0.2) , i will like to know if that is acceptable thing to do. Secondly is it normal to have better R squared values for Test (ie. 0.98 and 0.85 for train) Thank You 回答1: It is actually a common practice to scale target values in many cases. For example a highly skewed target may give better results if it

My objective is to predict the next 3 events of each id_num based on their previous events

限于喜欢 提交于 2019-12-25 00:34:20
问题 I am new to data science and I am working on a model that kind of looks like the sample data shown below. However in the orginal data there are many id_num and Events . My objective is to predict the next 3 events of each id_num based on their previous Events . Please help me in solving this or regarding the method to be used for solving, using R programming. 回答1: The simplest "prediction" is to assume that the sequence of letters will repeat for each id_num . I hope this is in line what the

Reproduce predictions with MOJO file of a H2O GBM model

吃可爱长大的小学妹 提交于 2019-12-24 22:32:21
问题 I used H2O version 3.26.0.5 to train a GBM model in a binary problem, to predict the probability of positive class. I saved the model file as MOJO and used this file to generate predictions in new data: ## first, restart R session ## # load the model library(h2o) h2o.init(nthreads = -1) model <- h2o.import_mojo("path_to_mojo_file") # load the new data input input <- read_csv("path_to_new_data") input_h2o <- as.h2o(input) # predictions predictions <- predict(model, input_h2o) When I run this

How to prove the reliability of a predictive model to executives?

白昼怎懂夜的黑 提交于 2019-12-24 16:40:18
问题 I trained data from 500 devices to predict their performance. Then I applied my trained model to a test data set for another 500 devices and show pretty good prediction results. Now my executives want me to prove this model will work well on one million devices not only on 500. Obviously we don't have data for one million devices. And if the model is not reliable, they want me to discover the required amount of train data in order to make a reliable prediction on one million devices. How

how to predict using var with exogenous variables in R

北城以北 提交于 2019-12-24 12:18:05
问题 I have the following data: library(data.table) modelling_dt_train <- structure(list(`1` = c(54593L, 74481L, 85566L, 97637L, 101081L, 184089L, 158895L, 153780L, 153681L, 157188L, 142216L, 136437L, 135501L, 111264L, 123259L, 110397L, 146034L, 162900L, 132499L, 121516L, 119651L, 114045L, 112551L, 123209L, 134930L, 132147L, 151327L, 155666L, 158538L, 205766L, 200407L, 219588L, 231954L, 179884L, 159121L, 156148L, 136191L, 132956L, 202086L, 141047L, 118490L, 116595L, 127620L, 135962L, 137419L,

PicklingError: Can't pickle <class 'module'>: attribute lookup module on builtins failed

梦想与她 提交于 2019-12-24 10:58:01
问题 Can we save any of the created LSTM models themselves? I believe that “pickling” is the standard method to serialize python objects to a file. Ideally, I wanted to create a python module that contained one or more functions that either allowed me to specify an LSTM model to load or used a hard-coded pre-fit model to generate forecasts based on data passed in to initialize the model. I tried to use it but gave me an error. Code that I used: # create and fit the LSTM network batch_size = 1

Linear regression with `lm()`: prediction interval for aggregated predicted values

故事扮演 提交于 2019-12-24 10:47:21
问题 I'm using predict.lm(fit, newdata=newdata, interval="prediction") to get predictions and their prediction intervals (PI) for new observations. Now I would like to aggregate (sum and mean) these predictions and their PI's based on an additional variable (i.e. a spatial aggregation on the zip code level of predictions for single households). I learned from StackExchange, that you cannot aggregate the prediction intervals of single predictions just by aggregating the limits of the prediction

Get risk predictions in WEKA using own Java code

拟墨画扇 提交于 2019-12-24 04:24:05
问题 I already checked the "Making predictions" documentation of WEKA and it contains explicit instructions for command line and GUI predictions. I want to know how to get a prediction value like the one below I got from the GUI using the Agrawal dataset ( weka.datagenerators.classifiers.classification.Agrawal ) in my own Java code: inst#, actual, predicted, error, prediction 1, 1:0, 2:1, +, 0.941 2, 1:0, 1:0, , 1 3, 1:0, 1:0, , 1 4, 1:0, 1:0, , 1 5, 1:0, 1:0, , 1 6, 1:0, 1:0, , 1 7, 1:0, 2:1, +,

Forecast package Prediction Horizon issue in R

你说的曾经没有我的故事 提交于 2019-12-24 01:13:03
问题 I am new to R. I was trying to predict using holt method but getting this strange error. I am using forecast package V-7.1 with R (version 3.2.5) and Rstudio (Version 0.99.896). I reinstall all from R to Rstudio but did not work. Only h from 1 to 10 works. Any idea?? library(forecast) library(stats) library(base) x=data$cost k<-holt(x,damped=TRUE) m=forecast(k,h=20) Error in forecast.forecast(k, h = 20) : Please select a longer horizon when the forecasts are first computed Other testing cases

Predict/estimate values using randomForest in R

早过忘川 提交于 2019-12-24 00:57:04
问题 I want to predict values for my Pop_avg field in my unsurveyed areas based on surveyed areas. I am using randomForest based on a suggestion to my earlier question. My surveyed areas: > surveyed <- read.csv("summer_surveyed.csv", header = T) > surveyed_1 <- surveyed[, -c(1,2,3,5,6,7,9,10,11,12,13,15)] > head(surveyed_1, n=1) VEGETATION Pop_avg Acres_1 1 Acer rubrum-Vaccinium corymbosum-Amelanchier spp. 0 27.68884 My unsurveyed areas: > unsurveyed <- read.csv("summer_unsurveyed.csv", header = T