问题
I'm trying to calculate the inter-observer reliability in R for a scoring system using Light's kappa provided by the irr package. It's a fully crossed design in which fifteen observers scored 20 subjects for something being present ("1") or something not being present ("0"). This is my data frame (imported from an excel sheet):
library(irr)
my.df #my dataframe
a b c d e f g h i j k l m n o
1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0
4 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0
5 0 1 0 0 1 1 0 0 0 1 1 0 0 1 0
6 0 1 0 0 1 1 0 0 0 0 0 1 1 0 0
7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
8 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
11 0 1 1 1 0 1 0 0 0 1 0 0 0 0 1
12 0 1 0 0 0 1 0 1 0 1 0 0 1 0 0
13 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
14 0 1 0 1 0 1 1 0 0 1 1 1 1 1 0
15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
16 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
17 0 1 0 1 1 1 0 0 0 0 0 1 1 1 0
18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
20 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0
Next I try and calculate the kappa value and I get the following response
kappam.light(my.df) #calculating the kappa-value
Light's Kappa for m Raters
Subjects = 20
Raters = 15
Kappa = NaN
z = NaN
p-value = NaN
Warning messages:
1: In sqrt(varkappa) : NaNs produced
2: In sqrt(varkappa) : NaNs produced
3: In sqrt(varkappa) : NaNs produced
4: In sqrt(varkappa) : NaNs produced
5: In sqrt(varkappa) : NaNs produced
6: In sqrt(varkappa) : NaNs produced
7: In sqrt(varkappa) : NaNs produced
8: In sqrt(varkappa) : NaNs produced
9: In sqrt(varkappa) : NaNs produced
10: In sqrt(varkappa) : NaNs produced
I already tried changing the class of all the variables to factors, characters, numeric, boolean. Nothing works. I suspect it has something to do with the relatively low numbers of "1" scores. Any suggestions?
EDIT: I found a solution to the problem, without having to exclude data. To calculate a prevalence and bias adjusted kappa, the pabak can be used for birater problems. For multirater problems like this you should use Randolph's kappa. This is based on the fleiss' kappa and therefore does not take variance in consideration. Ideal for the problem I had.
An online calculator can be found here In R, the Raters package can be used. I've compared the outcome between the two methods, and the results are virtually the same (a difference in the sixth decimal).
回答1:
You are getting this error because you have no variability in the columns a
and i
.
First, check the variability across the columns
apply(df,2,sd)
a b c d e f g h i j k l m n o
0.0000000 0.5104178 0.3663475 0.4103913 0.3663475 0.4893605 0.3077935 0.2236068 0.0000000 0.4701623 0.3663475 0.4103913 0.4103913 0.4103913 0.2236068
You see that columns a
and i
have no variability. Variability is needed because Kappa calculates the inter-rater reliability and corrects for chance agreement. With two unknowns, and no variability this can't be calculated.
Therefore, you get output without errors if you remove these 2 columns.
df$a=NULL
df$i=NULL
kappam.light(df)
Light's Kappa for m Raters
Subjects = 20
Raters = 13
Kappa = 0.19
z = 0
p-value = 1
来源:https://stackoverflow.com/questions/29256701/kappam-light-from-irr-package-in-r-warning-sqrtvarkappa-nans-produced-kappa