问题
Whole vector is ok and has no NAs
:
> summary(data$marks)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.00 6.00 6.00 6.02 7.00 7.00
> length(data$marks)
[1] 2528
However, when trying to calculate a subset using a criteria I receive lots of NAs
:
> summary(data[data$student=="John",]$marks)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
1.000 6.000 6.000 6.169 7.000 7.000 464
> length(data[data$student=="John",]$marks)
[1] 523
回答1:
I think the problem is that you have missing values for student
. As a result, when you subset by student
, all the NA
values for student end up producing NA
for marks
when you take your subset. Wrap the subsetting condition in which()
to avoid this problem. Here are a few examples that will hopefully clarify what's happening:
# Fake data
set.seed(103)
dat = data.frame(group=rep(LETTERS[1:3], each=3),
value=rnorm(9))
dat$group[1] = NA
dat$value
dat[dat$group=="B", "value"]
dat[which(dat$group=="B"), "value"]
# Simpler example
x = c(10,20,30,40, NA)
x>20
x[x>20]
which(x>20)
x[which(x>20)]
回答2:
First Note that NA=="foo"
results in NA. When subsetting a vector with a NA value the result is NA.
t = c(1,2,3)
t[c(1,NA)]
回答3:
a tidyverse
solution. I find these to be easier to read than base R.
library(tidyverse)
data %<%
filter(student == "John") %<%
summary(marks)
来源:https://stackoverflow.com/questions/34055552/na-when-trying-to-summarize-a-subset-of-data-r