auc

Plotting a linear discriminant analysis, classification tree and Naive Bayes Curve on a single ROC plot

为君一笑 提交于 2019-12-06 22:17:51
The data is present at the very bottom of the page and is called LDA.scores'. This is a classification task where I performed three supervised machine learning classification techniques on the data-set. All coding is supplied to show how these ROC curves were produced. I apologise for asking a loaded question but I have been trying to solve these issues using different combinations of code for almost two weeks, so if anyone can help me, then thank you. The main issue is the Naive Bayes curve shows a perfect score of 1, which is obviously wrong, and I cannot solve how to incorporate the linear

How to get ROC curve for decision tree?

为君一笑 提交于 2019-12-06 10:57:10
问题 I am trying to find ROC curve and AUROC curve for decision tree. My code was something like clf.fit(x,y) y_score = clf.fit(x,y).decision_function(test[col]) pred = clf.predict_proba(test[col]) print(sklearn.metrics.roc_auc_score(actual,y_score)) fpr,tpr,thre = sklearn.metrics.roc_curve(actual,y_score) output: Error() 'DecisionTreeClassifier' object has no attribute 'decision_function' basically, the error is coming up while finding the y_score . Please explain what is y_score and how to solve

How to compute AUC with ROCR package

浪尽此生 提交于 2019-12-05 03:33:36
I have fitted a SVM model and created the ROC curve with ROCR package. How can I compute the Area Under the Curve (AUC)? set.seed(1) tune.out=tune(svm ,Negative~.-Positive, data=trainSparse, kernel ="radial",ranges=list(cost=c(0.1,1,10,100,1000),gamma=c(0.5,1,2,3,4) )) summary(tune.out) best=tune.out$best.model ##prediction on the test set ypred = predict(best,testSparse, type = "class") table(testSparse$Negative,ypred) ###Roc curve yhat.opt = predict(best,testSparse,decision.values = TRUE) fitted.opt = attributes(yhat.opt)$decision.values rocplot(fitted.opt,testSparse ["Negative"], main =

How to interpret this triangular shape ROC AUC curve?

放肆的年华 提交于 2019-12-04 18:55:49
I have 10+ features and a dozen thousand of cases to train a logistic regression for classifying people's race. First example is French vs non-French, and second example is English vs non-English. The results are as follows: ////////////////////////////////////////////////////// 1= fr 0= non-fr Class count: 0 69109 1 30891 dtype: int64 Accuracy: 0.95126 Classification report: precision recall f1-score support 0 0.97 0.96 0.96 34547 1 0.92 0.93 0.92 15453 avg / total 0.95 0.95 0.95 50000 Confusion matrix: [[33229 1318] [ 1119 14334]] AUC= 0.944717975754 /////////////////////////////////////////

plot multiple ROC curves for logistic regression model in R

陌路散爱 提交于 2019-12-04 15:54:27
I have a logistic regression model (using R) as fit6 <- glm(formula = survived ~ ascore + gini + failed, data=records, family = binomial) summary(fit6) I'm using pROC package to draw ROC curves and figure out AUC for 6 models fit1 through fit6. I have approached this way to plots one ROC. prob6=predict(fit6,type=c("response")) records$prob6 = prob6 g6 <- roc(survived~prob6, data=records) plot(g6) But is there a way I can combine the ROCs for all 6 curves in one plot and display the AUCs for all of them, and if possible the Confidence Intervals too. You can use the add = TRUE argument the plot

模型评价指标:AUC

一笑奈何 提交于 2019-12-04 10:57:46
参考链接: https://www.iteye.com/blog/lps-683-2387643 问题: AUC是什么 AUC能拿来干什么 AUC如何求解(深入理解AUC) AUC是什么 混淆矩阵(Confusion matrix) 混淆矩阵是理解大多数评价指标的基础,毫无疑问也是理解AUC的基础。丰富的资料介绍着混淆矩阵的概念,这里用一个经典图来解释混淆矩阵是什么。 显然,混淆矩阵包含四部分的信息: 1. True negative(TN),称为真阴率,表明实际是负样本预测成负样本的样本数 2. False positive(FP),称为假阳率,表明实际是负样本预测成正样本的样本数 3. False negative(FN),称为假阴率,表明实际是正样本预测成负样本的样本数 4. True positive(TP),称为真阳率,表明实际是正样本预测成正样本的样本数 对照着混淆矩阵,很容易就能把关系、概念理清楚,但是久而久之,也很容易忘记概念。不妨我们按照位置前后分为两部分记忆,前面的部分是True/False表示真假,即代表着预测的正确性,后面的部分是positive/negative表示正负样本,即代表着预测的结果,所以,混淆矩阵即可表示为 正确性-预测结果 的集合。现在我们再来看上述四个部分的概念(均代表样本数,下述省略): 1. TN,预测是负样本,预测对了 2. FP

Why when I use GridSearchCV with roc_auc scoring, the score is different for grid_search.score(X,y) and roc_auc_score(y, y_predict)?

我的梦境 提交于 2019-12-04 01:30:46
问题 I am using stratified 10-fold cross validation to find model that predicts y (binary outcome) from X (X has 34 labels) with the highest auc. I set the GridSearchCV: log_reg = LogisticRegression() parameter_grid = {'penalty' : ["l1", "l2"],'C': np.arange(0.1, 3, 0.1),} cross_validation = StratifiedKFold(n_splits=10,shuffle=True,random_state=100) grid_search = GridSearchCV(log_reg, param_grid = parameter_grid,scoring='roc_auc', cv = cross_validation) And then do the cross-validation: grid

机器学习可视化:模型评估和参数调优

被刻印的时光 ゝ 提交于 2019-12-03 17:30:20
本篇文章详细阐述机器学习模型评估和参数调优。将主要围绕两个问题来阐述: “知其所以然”:当你选择的一个机器学习模型运行时,你要知道它是如何工作的; “青出于蓝”:更进一步,你得知道如何让此机器学习模型工作的更优。 模型评估的方法 一般情况来说,F1评分或者R平方(R-Squared value)等数值评分可以告诉我们训练的机器学习模型的好坏。也有其它许多度量方式来评估拟合模型。 你应该猜出来,我将提出使用可视化的方法结合数值评分来更直观的评判机器学习模型。接下来的几个部分将分享一些有用的工具。 首先想声明的,单单一个评分或者一条线,是无法完全评估一个机器学习模型。偏离真实场景来评估机器学习模型('good' or 'bad')都是“耍流氓”。某个机器学习模型若可“驾驭”小样本数据集生成最多预测模型(即,命中更多预测数据集)。如果一个拟合模型比其它拟合过的模型形式或者你昨天的预测模型能够得到更好的结果,那即是好('good')。 下面是一些标准指标: confusion_matrix , mean_squared_error , r2_score ,这些可以用来评判分类器或者回归的好坏。表格中给出的是 Scikit-Learn 中的函数以及描述: 评估分类模型 : 指标 描述 Scikit-learn函数 Precision 精准度 from sklearn.metrics

Feature Selection in caret rfe + sum with ROC

ぃ、小莉子 提交于 2019-12-03 14:02:55
问题 I have been trying to apply recursive feature selection using caret package. What I need is that ref uses the AUC as performance measure. After googling for a month I cannot get the process working. Here is the code I have used: library(caret) library(doMC) registerDoMC(cores = 4) data(mdrr) subsets <- c(1:10) ctrl <- rfeControl(functions=caretFuncs, method = "cv", repeats =5, number = 10, returnResamp="final", verbose = TRUE) trainctrl <- trainControl(classProbs= TRUE) caretFuncs$summary <-

Scikit learn GridSearchCV AUC performance

匿名 (未验证) 提交于 2019-12-03 09:14:57
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I'm using GridSearchCV to identify the best set of parameters for a random forest classifier. PARAMS = { 'max_depth': [8,None], 'n_estimators': [500,1000] } rf = RandomForestClassifier() clf = grid_search.GridSearchCV(estimator=rf, param_grid=PARAMS, scoring='roc_auc', cv=5, n_jobs=4) clf.fit(data, labels) where data and labels are respectively the full dataset and the corresponding labels. Now, I compared the performance returned by the GridSearchCV (from clf.grid_scores_ ) with a "manual" AUC estimation: aucs = [] for fold in range (0,n