prediction | 易学教程

R prediction package VS Stata margins

阅读更多关于 R prediction package VS Stata margins

问题 I'm switching from Stata to R, and I find inconsistent results when I use prediction to compute marginal pred and the results from the Stata command margins fixing the values of a variable to x . Here is the example: library(dplyr) library(prediction) d <- data.frame(x1 = factor(c(1,1,1,2,2,2), levels = c(1, 2)), x2 = factor(c(1,2,3,1,2,3), levels = c(1, 2, 3)), x3 = factor(c(1,2,1,2,1,2), levels = c(1, 2)), y = c(3.1, 2.8, 2.5, 4.3, 4.0, 3.5)) m2 <- lm(y ~ x1 + x2 + x3, d) summary(m2) marg2a

logistic regression predicts “NA” probability in R - why?

阅读更多关于 logistic regression predicts “NA” probability in R - why?

问题 I have run a logistic regression in R using the following code: logistic.train.model3 <- glm(josh.model2, family=binomial(link=logit), data=auth, na.action = na.exclude) print(summary(logistic.train.model3)) My response variable is binary, taking on values of 1 or 0. when I look at the summary, everything looks fine, every variable has a coefficient. However, when I try to output the predicted probabilities using the following code: auth$predict.train.logistic <- predict(logistic.train.model3

Polynomial Regression nonsense Predictions

阅读更多关于 Polynomial Regression nonsense Predictions

问题 Suppose I want to fit a linear regression model with degree two (orthogonal) polynomial and then predict the response. Here are the codes for the first model (m1) x=1:100 y=-2+3*x-5*x^2+rnorm(100) m1=lm(y~poly(x,2)) prd.1=predict(m1,newdata=data.frame(x=105:110)) Now let's try the same model but instead of using $poly(x,2)$, I will use its columns like: m2=lm(y~poly(x,2)[,1]+poly(x,2)[,2]) prd.2=predict(m2,newdata=data.frame(x=105:110)) Let's look at the summaries of m1 and m2. > summary(m1)

R or Python - loop the test data - Prediction validation next 24 hours (96 values each day)

阅读更多关于 R or Python - loop the test data - Prediction validation next 24 hours (96 values each day)

问题 I have a large dataset, below the training and test datasets train_data is from 2016-01-29 to 2017-12-31 head(train_data) date Date_time Temp Ptot JFK AEH ART CS CP 1 2016-01-29 2016-01-29 00:00:00 30.3 1443.888 52.87707 49.36879 28.96548 6.239999 49.61212 2 2016-01-29 2016-01-29 00:15:00 30.3 1410.522 49.50248 49.58356 26.37977 5.024000 49.19649 3 2016-01-29 2016-01-29 00:30:00 30.3 1403.191 50.79809 49.04253 26.15317 5.055999 47.48126 4 2016-01-29 2016-01-29 00:45:00 30.3 1384.337 48.88359

How to write a code for link prediction precision assessment in python?

阅读更多关于 How to write a code for link prediction precision assessment in python?

问题 I am doing a link prediction problem using the adamic_adar index. The dataset is a grid network(edgelist with 1000 links). I randomly selected 80% (800) of the edges from the observed dataset. I need to select the highest 200 predicted links from preds as below and also calculate the precision ratio. I dont know what to do next. How would I do..help! import numpy as np import networkx as nx G = nx.read_edgelist('Grid.txt', create_using=nx.Graph(), nodetype=int) preds = nx.adamic_adar_index(G)

Replication of scikit.svm.SRV.predict(X)

阅读更多关于 Replication of scikit.svm.SRV.predict(X)

问题 I'm trying to replicate scikit-learn's svm.svr.predict(X) and don't know how to do it correctly. I want to do is, because after training the SVM with an RBF kernel I would like to implement the prediction on another programming language (Java) and I would need to be able to export the model's parameters to be able to perform predictions of unknown cases. On scikit's documentation page, I see that there are 'support_ and 'support_vectors_ attributes, but don't understand how to replicate the

Cross validation for glm() models

阅读更多关于 Cross validation for glm() models

问题 I'm trying to do a 10-fold cross validation for some glm models that I have built earlier in R. I'm a little confused about the cv.glm() function in the boot package, although I've read a lot of help files. When I provide the following formula: library(boot) cv.glm(data, glmfit, K=10) Does the "data" argument here refer to the whole dataset or only to the test set? The examples I have seen so far provide the "data" argument as the test set but that did not really make sense, such as why do 10

R: multiple linear regression model and prediction model

阅读更多关于 R: multiple linear regression model and prediction model

问题 Starting from a linear model1 = lm(temp~alt+sdist) i need to develop a prediction model, where new data will come in hand and predictions about temp will be made. I have tried doing something like this: model2 = predict.lm(model1, newdata=newdataset) However, I am not sure this is the right way. What I would like to know here is, if this is the right way to go in order to make prediction about temp . Also I am a bit confused when it comes to the newdataset . Which values should be filled in

R: multiple linear regression model and prediction model

阅读更多关于 R: multiple linear regression model and prediction model

Merge two regression prediction models (with subsets of a data frame) back into the data frame (one column)

阅读更多关于 Merge two regression prediction models (with subsets of a data frame) back into the data frame (one column)

问题 I am building atop a similar question asked and answered on SO one year ago. It relates to this post: how to merge two linear regression prediction models (each per data frame's subset) into one column of the data frame I will use the same data as was used there, but with a new column. I create the data : dat = read.table(text = " cats birds wolfs snakes trees 0 3 8 7 2 1 3 8 7 3 1 1 2 3 2 0 1 2 3 1 0 1 2 3 2 1 6 1 1 3 0 6 1 1 1 1 6 1 1 1 " ,header = TRUE) Model the number of wolves, using