prediction

mgcv gam() error: model has more coefficients than data

末鹿安然 提交于 2020-01-01 07:13:43
问题 I am using GAM (generalized additive models) for my dataset. This dataset has 32 observations, with 6 predictor variables and a response variable (namely power). I am using gam() function of the mgcv package to fit the models. Whenever, I try to fit a model I do get an error message as: Error in gam(formula.hh, data = data, na.action = na.exclude, : Model has more coefficients than data From this error message, I infer that I have more predictor variables as compared to the number of

Predicting probabilities of classes in case of Gradient Boosting Trees in Spark using the tree output

限于喜欢 提交于 2020-01-01 05:29:09
问题 It is known that GBT s in Spark gives you predicted labels as of now. I was thinking of trying to calculate predicted probabilities for a class (say all the instances falling under a certain leaf) The codes to build GBT's import org.apache.spark.SparkContext import org.apache.spark.mllib.regression.LabeledPoint import org.apache.spark.mllib.linalg.Vectors import org.apache.spark.mllib.tree.GradientBoostedTrees import org.apache.spark.mllib.tree.configuration.BoostingStrategy import org.apache

Predicting missing values with scikit-learn's Imputer module

那年仲夏 提交于 2020-01-01 01:39:09
问题 I am writing a very basic program to predict missing values in a dataset using scikit-learn's Imputer class. I have made a NumPy array, created an Imputer object with strategy='mean' and performed fit_transform() on the NumPy array. When I print the array after performing fit_transform(), the 'Nan's remain, and I dont get any prediction. What am I doing wrong here? How do I go about predicting the missing values? import numpy as np from sklearn.preprocessing import Imputer X = np.array([[23

GBM multinomial distribution, how to use predict() to get predicted class?

独自空忆成欢 提交于 2019-12-30 08:27:47
问题 I am using the multinomial distribution from the gbm package in R. When I use the predict function, I get a series of values: 5.086328 -4.738346 -8.492738 -5.980720 -4.351102 -4.738044 -3.220387 -4.732654 but I want to get the probability of each class occurring. How do I recover the probabilities? Thank You. 回答1: Take a look at ?predict.gbm , you'll see that there is a "type" parameter to the function. Try out predict(<gbm object>, <new data>, type="response") . 回答2: predict.gbm(..., type=

tensorflow之分类学习

╄→гoц情女王★ 提交于 2019-12-27 10:52:54
写在前面的话 MNIST教程是tensorflow中文社区的第一课,例程即训练一个 手写数字识别 模型: http://www.tensorfly.cn/tfdoc/tutorials/mnist_beginners.html 参考视频: https://morvanzhou.github.io/tutorials/machine-learning/tensorflow/5-01-classifier/ MNIST编程 代码全文 import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets('MNIST_data',one_hot = True) def add_layer(inputs, in_size, out_size, activation_function=None): Weights = tf.Variable(tf.random_normal([in_size, out_size])) biases = tf.Variable(tf.zeros([1, out_size]) + 0.1) Wx_plus_b = tf.matmul(inputs, Weights) + biases if

Error in predicting test data when apply prediction fda.model (flexible discriminant analysis)

杀马特。学长 韩版系。学妹 提交于 2019-12-25 16:38:22
问题 library(mda) I'm using: pred.test <- predict(model.fda, test.data) after model.fda <- fda(Y~., train.data) but obtained the following message: Error in mindist[l] <- ndist[l] : NAs are not allowed in subscripted assignments. Thank you very much for the hint! 回答1: I've solved this issue after normalizing the data (It was binary values only and maybe too sparse in my case). 来源: https://stackoverflow.com/questions/30172523/error-in-predicting-test-data-when-apply-prediction-fda-model-flexible

Using slices in Python

安稳与你 提交于 2019-12-25 08:14:46
问题 I use the dataset from UCI repo: http://archive.ics.uci.edu/ml/datasets/Energy+efficiency Then doing next: from pandas import * from sklearn.neighbors import KNeighborsRegressor from sklearn.linear_model import LinearRegression, LogisticRegression from sklearn.svm import SVR from sklearn.ensemble import RandomForestRegressor from sklearn.metrics import r2_score from sklearn.cross_validation import train_test_split dataset = read_excel('/Users/Half_Pint_boy/Desktop/ENB2012_data.xlsx') dataset

Occurrence prediction

自古美人都是妖i 提交于 2019-12-25 04:32:32
问题 I'd like to know what method is best suited for predicting event occurrences. For example, given a set of data from 5 years of malaria infection occurrences and several other factors that affect the occurrences, I'd like to predict the next five years for malaria infection occurrences. What I thought of doing was to derive a kind of occurrence factor using fuzzy logic rules, and then average the occurrences with the occurrence factor to get the first predicted occurrence, and then average all

Multiclass Decision Forest vs Random Forest

◇◆丶佛笑我妖孽 提交于 2019-12-25 04:12:24
问题 How does Multiclass Decision Forest differ from Random Forest? What factors do they have in common? It appears there is not a clear answer on the web regarding this matter. 回答1: Random forests or random decision forests is an extension of the decision forests (ensemble of decision trees) combining bagging and random selection of features to construct a collection of decision trees with controlled variance. A very good paper from Microsoft research you may consider to look at. 来源: https:/

Generalized additive models for calibration

六眼飞鱼酱① 提交于 2019-12-25 03:41:23
问题 I work on calibration of probabilities. I'm using a probability mapping approach called generalized additive models. The algorithm I wrote is: probMapping = function(x, y, datax, datay) { if(length(x) < length(y))stop("train smaller than test") if(length(datax) < length(datay))stop("train smaller than test") datax$prob = x # trainset: data and raw probabilities datay$prob = y # testset: data and raw probabilities prob_map = gam(Target ~ prob, data = datax, familiy = binomial, trace = TRUE)