prediction | 易学教程

Is it acceptable to scale target values for regressors?

阅读更多关于 Is it acceptable to scale target values for regressors?

问题 I am getting very high RMSE and MAE for MLPRegressor , ForestRegression and Linear regression with only input variables scaled (30,000+) however when i scale target values aswell i get RMSE (0.2) , i will like to know if that is acceptable thing to do. Secondly is it normal to have better R squared values for Test (ie. 0.98 and 0.85 for train) Thank You 回答1: It is actually a common practice to scale target values in many cases. For example a highly skewed target may give better results if it

My objective is to predict the next 3 events of each id_num based on their previous events

阅读更多关于 My objective is to predict the next 3 events of each id_num based on their previous events

问题 I am new to data science and I am working on a model that kind of looks like the sample data shown below. However in the orginal data there are many id_num and Events . My objective is to predict the next 3 events of each id_num based on their previous Events . Please help me in solving this or regarding the method to be used for solving, using R programming. 回答1: The simplest "prediction" is to assume that the sequence of letters will repeat for each id_num . I hope this is in line what the

Reproduce predictions with MOJO file of a H2O GBM model

阅读更多关于 Reproduce predictions with MOJO file of a H2O GBM model

问题 I used H2O version 3.26.0.5 to train a GBM model in a binary problem, to predict the probability of positive class. I saved the model file as MOJO and used this file to generate predictions in new data: ## first, restart R session ## # load the model library(h2o) h2o.init(nthreads = -1) model <- h2o.import_mojo("path_to_mojo_file") # load the new data input input <- read_csv("path_to_new_data") input_h2o <- as.h2o(input) # predictions predictions <- predict(model, input_h2o) When I run this

How to prove the reliability of a predictive model to executives?

阅读更多关于 How to prove the reliability of a predictive model to executives?

问题 I trained data from 500 devices to predict their performance. Then I applied my trained model to a test data set for another 500 devices and show pretty good prediction results. Now my executives want me to prove this model will work well on one million devices not only on 500. Obviously we don't have data for one million devices. And if the model is not reliable, they want me to discover the required amount of train data in order to make a reliable prediction on one million devices. How

how to predict using var with exogenous variables in R

阅读更多关于 how to predict using var with exogenous variables in R

问题 I have the following data: library(data.table) modelling_dt_train <- structure(list(`1` = c(54593L, 74481L, 85566L, 97637L, 101081L, 184089L, 158895L, 153780L, 153681L, 157188L, 142216L, 136437L, 135501L, 111264L, 123259L, 110397L, 146034L, 162900L, 132499L, 121516L, 119651L, 114045L, 112551L, 123209L, 134930L, 132147L, 151327L, 155666L, 158538L, 205766L, 200407L, 219588L, 231954L, 179884L, 159121L, 156148L, 136191L, 132956L, 202086L, 141047L, 118490L, 116595L, 127620L, 135962L, 137419L,

PicklingError: Can't pickle <class 'module'>: attribute lookup module on builtins failed

阅读更多关于 PicklingError: Can't pickle : attribute lookup module on builtins failed

问题 Can we save any of the created LSTM models themselves? I believe that “pickling” is the standard method to serialize python objects to a file. Ideally, I wanted to create a python module that contained one or more functions that either allowed me to specify an LSTM model to load or used a hard-coded pre-fit model to generate forecasts based on data passed in to initialize the model. I tried to use it but gave me an error. Code that I used: # create and fit the LSTM network batch_size = 1

Linear regression with `lm()`: prediction interval for aggregated predicted values

阅读更多关于 Linear regression with `lm()`: prediction interval for aggregated predicted values

问题 I'm using predict.lm(fit, newdata=newdata, interval="prediction") to get predictions and their prediction intervals (PI) for new observations. Now I would like to aggregate (sum and mean) these predictions and their PI's based on an additional variable (i.e. a spatial aggregation on the zip code level of predictions for single households). I learned from StackExchange, that you cannot aggregate the prediction intervals of single predictions just by aggregating the limits of the prediction

Get risk predictions in WEKA using own Java code

阅读更多关于 Get risk predictions in WEKA using own Java code

问题 I already checked the "Making predictions" documentation of WEKA and it contains explicit instructions for command line and GUI predictions. I want to know how to get a prediction value like the one below I got from the GUI using the Agrawal dataset ( weka.datagenerators.classifiers.classification.Agrawal ) in my own Java code: inst#, actual, predicted, error, prediction 1, 1:0, 2:1, +, 0.941 2, 1:0, 1:0, , 1 3, 1:0, 1:0, , 1 4, 1:0, 1:0, , 1 5, 1:0, 1:0, , 1 6, 1:0, 1:0, , 1 7, 1:0, 2:1, +,

Forecast package Prediction Horizon issue in R

阅读更多关于 Forecast package Prediction Horizon issue in R

问题 I am new to R. I was trying to predict using holt method but getting this strange error. I am using forecast package V-7.1 with R (version 3.2.5) and Rstudio (Version 0.99.896). I reinstall all from R to Rstudio but did not work. Only h from 1 to 10 works. Any idea?? library(forecast) library(stats) library(base) x=data$cost k<-holt(x,damped=TRUE) m=forecast(k,h=20) Error in forecast.forecast(k, h = 20) : Please select a longer horizon when the forecasts are first computed Other testing cases

Predict/estimate values using randomForest in R

阅读更多关于 Predict/estimate values using randomForest in R

问题 I want to predict values for my Pop_avg field in my unsurveyed areas based on surveyed areas. I am using randomForest based on a suggestion to my earlier question. My surveyed areas: > surveyed <- read.csv("summer_surveyed.csv", header = T) > surveyed_1 <- surveyed[, -c(1,2,3,5,6,7,9,10,11,12,13,15)] > head(surveyed_1, n=1) VEGETATION Pop_avg Acres_1 1 Acer rubrum-Vaccinium corymbosum-Amelanchier spp. 0 27.68884 My unsurveyed areas: > unsurveyed <- read.csv("summer_unsurveyed.csv", header = T