prediction | 易学教程

mgcv gam() error: model has more coefficients than data

阅读更多关于 mgcv gam() error: model has more coefficients than data

问题 I am using GAM (generalized additive models) for my dataset. This dataset has 32 observations, with 6 predictor variables and a response variable (namely power). I am using gam() function of the mgcv package to fit the models. Whenever, I try to fit a model I do get an error message as: Error in gam(formula.hh, data = data, na.action = na.exclude, : Model has more coefficients than data From this error message, I infer that I have more predictor variables as compared to the number of

Predicting probabilities of classes in case of Gradient Boosting Trees in Spark using the tree output

阅读更多关于 Predicting probabilities of classes in case of Gradient Boosting Trees in Spark using the tree output

问题 It is known that GBT s in Spark gives you predicted labels as of now. I was thinking of trying to calculate predicted probabilities for a class (say all the instances falling under a certain leaf) The codes to build GBT's import org.apache.spark.SparkContext import org.apache.spark.mllib.regression.LabeledPoint import org.apache.spark.mllib.linalg.Vectors import org.apache.spark.mllib.tree.GradientBoostedTrees import org.apache.spark.mllib.tree.configuration.BoostingStrategy import org.apache

Predicting missing values with scikit-learn's Imputer module

阅读更多关于 Predicting missing values with scikit-learn's Imputer module

问题 I am writing a very basic program to predict missing values in a dataset using scikit-learn's Imputer class. I have made a NumPy array, created an Imputer object with strategy='mean' and performed fit_transform() on the NumPy array. When I print the array after performing fit_transform(), the 'Nan's remain, and I dont get any prediction. What am I doing wrong here? How do I go about predicting the missing values? import numpy as np from sklearn.preprocessing import Imputer X = np.array([[23

GBM multinomial distribution, how to use predict() to get predicted class?

阅读更多关于 GBM multinomial distribution, how to use predict() to get predicted class?

问题 I am using the multinomial distribution from the gbm package in R. When I use the predict function, I get a series of values: 5.086328 -4.738346 -8.492738 -5.980720 -4.351102 -4.738044 -3.220387 -4.732654 but I want to get the probability of each class occurring. How do I recover the probabilities? Thank You. 回答1: Take a look at ?predict.gbm , you'll see that there is a "type" parameter to the function. Try out predict(<gbm object>, <new data>, type="response") . 回答2: predict.gbm(..., type=

tensorflow之分类学习

阅读更多关于 tensorflow之分类学习

写在前面的话 MNIST教程是tensorflow中文社区的第一课，例程即训练一个手写数字识别模型： http://www.tensorfly.cn/tfdoc/tutorials/mnist_beginners.html 参考视频： https://morvanzhou.github.io/tutorials/machine-learning/tensorflow/5-01-classifier/ MNIST编程代码全文 import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets('MNIST_data',one_hot = True) def add_layer(inputs, in_size, out_size, activation_function=None): Weights = tf.Variable(tf.random_normal([in_size, out_size])) biases = tf.Variable(tf.zeros([1, out_size]) + 0.1) Wx_plus_b = tf.matmul(inputs, Weights) + biases if

Error in predicting test data when apply prediction fda.model (flexible discriminant analysis)

阅读更多关于 Error in predicting test data when apply prediction fda.model (flexible discriminant analysis)

问题 library(mda) I'm using: pred.test <- predict(model.fda, test.data) after model.fda <- fda(Y~., train.data) but obtained the following message: Error in mindist[l] <- ndist[l] : NAs are not allowed in subscripted assignments. Thank you very much for the hint! 回答1: I've solved this issue after normalizing the data (It was binary values only and maybe too sparse in my case). 来源： https://stackoverflow.com/questions/30172523/error-in-predicting-test-data-when-apply-prediction-fda-model-flexible

Using slices in Python

阅读更多关于 Using slices in Python

问题 I use the dataset from UCI repo: http://archive.ics.uci.edu/ml/datasets/Energy+efficiency Then doing next: from pandas import * from sklearn.neighbors import KNeighborsRegressor from sklearn.linear_model import LinearRegression, LogisticRegression from sklearn.svm import SVR from sklearn.ensemble import RandomForestRegressor from sklearn.metrics import r2_score from sklearn.cross_validation import train_test_split dataset = read_excel('/Users/Half_Pint_boy/Desktop/ENB2012_data.xlsx') dataset

Occurrence prediction

阅读更多关于 Occurrence prediction

问题 I'd like to know what method is best suited for predicting event occurrences. For example, given a set of data from 5 years of malaria infection occurrences and several other factors that affect the occurrences, I'd like to predict the next five years for malaria infection occurrences. What I thought of doing was to derive a kind of occurrence factor using fuzzy logic rules, and then average the occurrences with the occurrence factor to get the first predicted occurrence, and then average all

Multiclass Decision Forest vs Random Forest

阅读更多关于 Multiclass Decision Forest vs Random Forest

问题 How does Multiclass Decision Forest differ from Random Forest? What factors do they have in common? It appears there is not a clear answer on the web regarding this matter. 回答1: Random forests or random decision forests is an extension of the decision forests (ensemble of decision trees) combining bagging and random selection of features to construct a collection of decision trees with controlled variance. A very good paper from Microsoft research you may consider to look at. 来源： https:/

Generalized additive models for calibration

阅读更多关于 Generalized additive models for calibration

问题 I work on calibration of probabilities. I'm using a probability mapping approach called generalized additive models. The algorithm I wrote is: probMapping = function(x, y, datax, datay) { if(length(x) < length(y))stop("train smaller than test") if(length(datax) < length(datay))stop("train smaller than test") datax$prob = x # trainset: data and raw probabilities datay$prob = y # testset: data and raw probabilities prob_map = gam(Target ~ prob, data = datax, familiy = binomial, trace = TRUE)