问题
I have a factor named SMOKE with levels "Y" and "N". Missing values were replaced with NA (from the initial level "NULL"). However when I view the factor I get something like this:
head(SMOKE)
N N <NA> Y Y N
Levels: Y N
Why is R displaying NA
as <NA>
? And is there a difference?
回答1:
When you are dealing with factors
, when the NA
is wrapped in angled brackets ( <NA>
), that indicates thtat it is in fact NA.
When it is NA
without brackets, then it is not NA, but rather a proper factor whose label is "NA"
# Note a 'real' NA and a string with the word "NA"
x <- factor(c("hello", NA, "world", "NA"))
x
[1] hello <NA> world NA
Levels: hello NA world <~~ The string appears as a level, the actual NA does not.
as.numeric(x)
[1] 1 NA 3 2 <~~ The string has a numeric value (here, 2, alphabetically)
The NA's numeric value is just NA
Edit to answer @Arun's question:
R
is simply trying to distinguish between a string whose value are the two letters "NA"
and an actual missing value, NA
Thus the difference you see when displaying df
versus df$y
. Example:
df <- data.frame(x=1:4, y=c("a", NA_character_, "c", "NA"), stringsAsFactors=FALSE)
Note the two different styles of NA:
> df
x y
1 1 a
2 2 <NA>
3 3 c
4 4 NA
However, if we look at just 'df$y'
[1] "a" NA "c" "NA"
But, if we remove the quotation marks (similar to what we see when printing a data.frame to the console):
print(df$y, quote=FALSE)
[1] a <NA> c NA
And thus, we once again have the distinction of NA
via the angled brackets.
回答2:
It is just the way that R displays NA
in a factor:
> as.factor(NA)
[1] <NA>
Levels:
>
> f <- factor(c(1:3, NA))
> levels(f)
[1] "1" "2" "3"
> f
[1] 1 2 3 <NA>
Levels: 1 2 3
> is.na(f)
[1] FALSE FALSE FALSE TRUE
One presumes this is a means by which one would differentiate between NA
and "NA"
in the way a factor is printed as it prints without the quotes, even for character labels/levels:
> f2 <- factor(c("NA",NA))
> f2
[1] NA <NA>
Levels: NA
> is.na(f2)
[1] FALSE TRUE
回答3:
Perhaps one exception might be data.table. There it seems that a character field prints it as < NA >, while a numeric one as NA. NB: I added extra spaces in < NA >, otherwise this webpage did not show it properly.
library("data.table")
y<-data.table(a=c("a","b",NA))
print(y)
a
1: a
2: b
3: < NA >
factor(y$a)
[1] a b < NA >
Levels: a b
## we enter a numeric argument
y<-data.table(a=c(1,2,NA))
print(y)
a
1: 1
2: 2
3: NA
factor(y$a)
[1] 1 2 < NA >
Levels: 1 2
来源:https://stackoverflow.com/questions/16253789/what-is-the-difference-between-na-and-na