roc | 易学教程

Does anyone know how to generate AUC/Roc Area based on the predition?

阅读更多关于 Does anyone know how to generate AUC/Roc Area based on the predition?

问题 I know the AUC/ROC area (http://weka.wikispaces.com/Area+under+the+curve) in weka is based on the e Mann Whitney statistic (http://en.wikipedia.org/wiki/Mann-Whitney_U) But my doubt is, if I've got 10 labeled instances (Y or N, binary target attribute), by applying an algorithm (i.e. J48) onto the dataset, then there are 10 predicted labels on these 10 instances. Then what exactly should I use to calculate the AUC_Y, AUC_N, and AUC_Avg? Use the prediction's ranked label Y and N or the actual

Sensitivity and specificity changes using a single threshold and a gradient of thresholds at 0.5 using pROC in R

阅读更多关于 Sensitivity and specificity changes using a single threshold and a gradient of thresholds at 0.5 using pROC in R

问题 I am trying to calculate ROC for a model of multi-class image. But since I didn't find any best way for multi-class classification, I have converted it to binary class. I have 31 classes of image. Using binary methods I am trying to find ROC of each 31 classes individually. df <- read.xlsx("data.xlsx",sheetName = 1,header = F) dn <- as.vector(df$X1) # 31 class model_info <- read.csv("all_new.csv",stringsAsFactors = F) # details of model output (Actual labels, Model labels, probabablity values

How to plot a ROC curve using dataframe converted from CSV file

阅读更多关于 How to plot a ROC curve using dataframe converted from CSV file

问题 I was trying to plot a ROC curve by using the documentation provided by sklearn. My data is in a CSV file, and it looks like this.It has two classes 'Good'and 'Bad' screenshot of my CSV file And my code looks like this import numpy as np import matplotlib.pyplot as plt from itertools import cycle import sys from sklearn import svm, datasets from sklearn.metrics import roc_curve, auc from sklearn.model_selection import train_test_split from sklearn.preprocessing import label_binarize from

How can we plot ROC Curve for leave one out (LOO) cross validation using scikit-learn?

阅读更多关于 How can we plot ROC Curve for leave one out (LOO) cross validation using scikit-learn?

问题 On the scikit-learn website there is code for ROC curve for stratified kfold but no code is present for leave one out (LOO) cross validation. I tried by changing the code of kfold by loo but the result is nan. What can be the problem? I checked scikit-learn website but no code is there for loo cross validation. The code for stratified kfold is given in the link given below. https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc_crossval.html 来源： https://stackoverflow.com

How to plot ROC and calculate AUC for binary classifier with no probabilities (svm)?

阅读更多关于 How to plot ROC and calculate AUC for binary classifier with no probabilities (svm)?

问题 I have some SVM classifier (LinearSVC) outputting final classifications for every sample in the test set, something like 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1 and so on. The "truth" labels is also something like 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1 I would like to run that svm with some parameters, and generate points for the roc curve, and calculate auc. I could do this by myself, but I am sure someone did it before me for cases like this. Unfortunately, everything I can find is for cases where

ROCR package… what am I not getting?

阅读更多关于 ROCR package… what am I not getting?

问题 This question was migrated from Cross Validated because it can be answered on Stack Overflow. Migrated 3 years ago . I am testing a simple case using ROCR package in R. Basically, here is my code. I have a set of true values, and for each value, I have a set of predictions, and my labels are 1 if the prediction is within |2| of the true value, and 0 otherwise, like this: ID<- c(1,2,3,4,5) preds<-c(6,3,2,1,4) truevals<- c(8,4,2,1,7) df<-data.frame(ID, preds,truevals) df<- mutate(df, labels =

一文让你彻底理解准确率，精准率，召回率，真正率，假正率，ROC/AUC

阅读更多关于一文让你彻底理解准确率，精准率，召回率，真正率，假正率，ROC/AUC

参考资料： https://zhuanlan.zhihu.com/p/46714763 ROC/AUC作为机器学习的评估指标非常重要，也是面试中经常出现的问题（80%都会问到）。其实，理解它并不是非常难，但是好多朋友都遇到了一个相同的问题，那就是：每次看书的时候都很明白，但回过头就忘了，经常容易将概念弄混。还有的朋友面试之前背下来了，但是一紧张大脑一片空白全忘了，导致回答的很差。我在之前的面试过程中也遇到过类似的问题，我的面试经验是：一般笔试题遇到选择题基本都会考这个率，那个率，或者给一个场景让你选用哪个。面试过程中也被问过很多次，比如什么是AUC/ROC？横轴纵轴都代表什么？有什么优点？为什么要使用它？我记得在我第一次回答的时候，我将准确率，精准率，召回率等概念混淆了，最后一团乱。回去以后我从头到尾梳理了一遍所有相关概念，后面的面试基本都回答地很好。现在想将自己的一些理解分享给大家，希望读完本篇可以彻底记住ROC/AUC的概念。 ▌什么是性能度量？我们都知道机器学习要建模，但是对于模型性能的好坏（即模型的泛化能力），我们并不知道是怎样的，很可能这个模型就是一个差的模型，泛化能力弱，对测试集不能很好的预测或分类。那么如何知道这个模型是好是坏呢？我们必须有个评判的标准。为了了解模型的泛化能力，我们需要用某个指标来衡量，这就是性能度量的意义。有了一个指标，我们就可以对比不同模型了

Binary vectors as y_score argument of roc_curve

阅读更多关于 Binary vectors as y_score argument of roc_curve

问题 The sklearn roc_curve docstring states: "y_score : array, shape = [n_samples] Target scores, can either be probability estimates of the positive class, confidence values, or binary decisions." In what situation it would make sense to set y_score to a binary vector ("binary decisions")? Wouldn't that result in a ROC curve with one point on it which kind of defies the point? 回答1: If you are using a classifier that does not output probability scores (e.g. svm.SVC without an explicit probability

How to explain high AUC-ROC with mediocre precision and recall in unbalanced data?

阅读更多关于 How to explain high AUC-ROC with mediocre precision and recall in unbalanced data?

问题 I have some machine learning results that I am trying to make sense of. The task is to predict/label "Irish" vs. "non-Irish". Python 2.7's output: 1= ir 0= non-ir Class count: 0 4090942 1 940852 Name: ethnicity_scan, dtype: int64 Accuracy: 0.874921350119 Classification report: precision recall f1-score support 0 0.89 0.96 0.93 2045610 1 0.74 0.51 0.60 470287 avg / total 0.87 0.87 0.87 2515897 Confusion matrix: [[1961422 84188] [ 230497 239790]] AUC-ir= 0.901238104773 As you can see, the

Why am I getting a 1.000 ROC area value even when I don't have 100% of accuracy

阅读更多关于 Why am I getting a 1.000 ROC area value even when I don't have 100% of accuracy

问题 I am using Weka as a classifier, and it has worked great for me so far. However, in my last test, I got a 1.000 ROC area value (which, if i remember correctly, represents a perfect classification) without having 100% of accuracy, as can be seen in the Confusion Matrix in the Figure. My question is: Am I interpreting the results incorrectly or am I getting wrong results (maybe the classifier I am using is badly programmed, although I don't think it's likely)? Classification output Thank You!