问题
I have a fairly large data frame that has multiple "-" which represent missing data. The data frame consisted of multiple Excel files, which could not use the "na.strings =" or alternative function, so I had to import them with the "-" representation.
How can I replace all "-" in the data frame with NA / missing values? The data frame consists of 200 columns of characters, factors, and integers.
So far I have tried:
sum(df %in c("-"))
returns: [1] 0
df[df=="-"] <-NA #does not do anything
library(plyr)
df <- revalue(df, c("-",NA))
returns: Error in revalue(tmp, c("-", NA)) :
x is not a factor or a character vector.
library(anchors)
df <- replace.value(df,colnames(df),"-",as.character(NA))
Error in charToDate(x) :
character string is not in a standard unambiguous format
The data frame consists of 200 columns of characters, factors, and integers, so I can see why the last two do not work correctly. Any help would be appreciated.
回答1:
Since you're already using tidyverse functions, you can easily use na_if
from dplyr
within your pipes.
For example, I have a dataset where 999 is used to fill in a non-answer:
df <- tibble(
alpha = c("a", "b", "c", "d", "e"),
val1 = c(1, 999, 3, 8, 999),
val2 = c(2, 8, 999, 1, 2))
If I wanted to change val1
so 999 is NA, I could do:
df %>%
mutate(val1 = na_if(val1, 999))
In your case, it sounds like you want to replace a value across multiple variables, so using mutate_at
or mutate_if
would be more appropriate:
df %>%
mutate_at(vars(val1, val2), na_if, 999)
replaces all instances of 999 in both val1
and val2
with NA
and now looks like this:
# A tibble: 5 x 3
alpha val1 val2
<chr> <dbl> <dbl>
1 a 1. 2.
2 b NA 8.
3 c 3. NA
4 d 8. 1.
5 e NA 2.
回答2:
Here's a solution that will do it:
> library(dplyr)
> test <- tibble(x = c('100', '20.56', '0.003', '-', ' -'), y = 5:1)
> makeNA <- function(x) str_replace(x,'-',NA_character_)
> mutate_all(test, funs(makeNA))
# A tibble: 5 x 2
x y
<chr> <chr>
1 100 5
2 20.56 4
3 0.003 3
4 NA 2
5 NA 1
回答3:
I believe the simplest solution is with base R function is.na<-. It's meant to solve precisely that issue.
First, make up some data. Then set the required values to NA
.
set.seed(247) # make the results reproducible
df <- data.frame(X = 1:10, Y = sample(c("-", letters[1:2]), 10, TRUE))
is.na(df) <- df == "-"
df
# X Y
#1 1 a
#2 2 b
#3 3 b
#4 4 a
#5 5 <NA>
#6 6 b
#7 7 a
#8 8 <NA>
#9 9 b
#10 10 a
来源:https://stackoverflow.com/questions/49457877/r-replace-specific-value-contents-with-na