auc | 易学教程

R Caret Random Forest AUC too good to be true?

阅读更多关于 R Caret Random Forest AUC too good to be true?

问题 Relative newbie to predictive modeling--most of my training/experience is in inferential stats. I'm trying to predict student college graduation in 4 years. Basic issue is that I've done data cleaning (imputing, centering, scaling); split that processed/transformed data into training (70%) and testing (30%) sets; balanced the data using two approaches (because data was 65%=0, 35%=1--and I've found inconsistent advice on what classifies as unbalanced, but one source suggested anything not

How to interpret this triangular shape ROC AUC curve?

阅读更多关于 How to interpret this triangular shape ROC AUC curve?

问题 I have 10+ features and a dozen thousand of cases to train a logistic regression for classifying people's race. First example is French vs non-French, and second example is English vs non-English. The results are as follows: ////////////////////////////////////////////////////// 1= fr 0= non-fr Class count: 0 69109 1 30891 dtype: int64 Accuracy: 0.95126 Classification report: precision recall f1-score support 0 0.97 0.96 0.96 34547 1 0.92 0.93 0.92 15453 avg / total 0.95 0.95 0.95 50000

calculate cut-off that max sensitivity vs specificity using ROCR

阅读更多关于 calculate cut-off that max sensitivity vs specificity using ROCR

问题 I am trying to calculate the cut-off point that max sensitivity vs specifity. I am using the ROCR package and I have managed to plot the graph sensitivity vs specifity. However, I don't know how to calculate what is the cut off point that max sensitivity vs specifity. Ideal I would like to have a label in the graph that shows the cut off and the coordenates at the point. But, any suggestion to solve this question will be greatly appreciated. pred <- prediction( ROCR.simple$hello, ROCR.simple

机器学习基础：ROC曲线与AUC计算详解

阅读更多关于机器学习基础：ROC曲线与AUC计算详解

AUC & ROC AUC是一个模型评价指标，只能用于二分类模型的评价，对于二分类模型，还有损失函数（logloss），正确率（accuracy），准确率（precision），但相比之下AUC和logloss要比accuracy和precision用的多，原因是因为很多的机器学习模型计算结果都是概率的形式，那么对于概率而言，我们就需要去设定一个阈值来判定分类，那么这个阈值的设定就会对我们的正确率和准确率造成一定成都的影响。二元分类算法，通过AUC(Area under the Curve of ROC(receiver operating characteristic))进行评估横坐标： 1-Specificity ，伪正类率(False positive rate， FPR)，预测为正但实际为负的样本占所有负例样本的比例；纵坐标： Sensitivity ，真正类率(True positive rate， TPR)，预测为正且实际为正的样本占所有正例样本的比例。一 roc 曲线 1、roc曲线：接收者操作特征(receiveroperating characteristic),roc曲线上每个点反映着对同一信号刺激的感受性。横轴：负正类率(false postive rate FPR)特异度，划分实例中所有负例占所有负例的比例；(1

Plot ROC curve from Cross-Validation (training) data in R

阅读更多关于 Plot ROC curve from Cross-Validation (training) data in R

问题 I would like to know if there is a way to plot the average ROC Curve from the cross-validation data of a SVM-RFE model generated with the caret package. My results are: Recursive feature selection Outer resampling method: Cross-Validated (10 fold, repeated 5 times) Resampling performance over subset size: Variables ROC Sens Spec Accuracy Kappa ROCSD SensSD SpecSD AccuracySD KappaSD Selected 1 0.6911 0.0000 1.0000 0.5900 0.0000 0.2186 0.0000 0.0000 0.0303 0.0000 2 0.7600 0.3700 0.8067 0.6280 0

What is a threshold in a Precision-Recall curve?

阅读更多关于 What is a threshold in a Precision-Recall curve?

问题 I am aware of the concept of Precision as well as the concept of Recall. But I am finding it very hard to understand the idea of a 'threshold' which makes any P-R curve possible. Imagine I have a model to build that predicts the re-occurrence (yes or no) of cancer in patients using some decent classification algorithm on relevant features. I split my data for training and testing. Lets say I trained the model using the train data and got my Precision and Recall metrics using the test data.

What is a threshold in a Precision-Recall curve?

阅读更多关于 What is a threshold in a Precision-Recall curve?

Precision-Recall Curve computation by PRROC package in R

阅读更多关于 Precision-Recall Curve computation by PRROC package in R

问题 My question is relevant to this question. I am interested in computing Precision-Recall Curve (PRC) and area under PRC. I found a nice R package PRROC to do both tasks. According to package description (page 5) for function pr.curve , you have to give 2 parameters. 1) the classification scores of datapoints belonging to positive class only 2) the classification scores of datapoints belonging to negative class only (See manual page 7). The example they provide is: # create artificial scores as

Difference in ROC-AUC scores in sklearn RandomForestClassifier vs. auc methods

阅读更多关于 Difference in ROC-AUC scores in sklearn RandomForestClassifier vs. auc methods

问题 I am receiving different ROC-AUC scores from sklearn's RandomForestClassifier and roc_curve, auc methods, respectively. The following code got me an ROC-AUC (i.e. gs.best_score_) of 0.878: def train_model(mod = None, params = None, features = None, outcome = ...outcomes array..., metric = 'roc_auc'): gs = GridSearchCV(mod, params, scoring=metric, loss_func=None, score_func=None, fit_params=None, n_jobs=-1, iid=True, refit=True, cv=10, verbose=0, pre_dispatch='2*n_jobs', error_score='raise')

3-class AUC calculation in R (pROC package)

阅读更多关于 3-class AUC calculation in R (pROC package)

问题 I met a problem of 3-class ROC analysis in R and obtained a very annoying result (see here ). Now I try to use a different way to solve it. The data is iris and the classifier is multinomial logistic regression which is in nnet package. The code is below: # iris data (3-class ROC) library(nnet) library(pROC) # should be installed first: install.packages('pROC') data(iris) # 3-class logistic regression model = multinom(Species~., data = iris, trace = F) # confusion matrix (z1) & accuracy (E1)