After merging a dataframe with another im left with random NA\'s for the occasional row. I\'d like to set these NA\'s to 0 so I can perform calculations with them.
Why not try this
na.zero <- function (x) {
x[is.na(x)] <- 0
return(x)
}
na.zero(df)
You can just use the output of is.na
to replace directly with subsetting:
bothbeams.data[is.na(bothbeams.data)] <- 0
Or with a reproducible example:
dfr <- data.frame(x=c(1:3,NA),y=c(NA,4:6))
dfr[is.na(dfr)] <- 0
dfr
x y
1 1 0
2 2 4
3 3 5
4 0 6
However, be careful using this method on a data frame containing factors that also have missing values:
> d <- data.frame(x = c(NA,2,3),y = c("a",NA,"c"))
> d[is.na(d)] <- 0
Warning message:
In `[<-.factor`(`*tmp*`, thisvar, value = 0) :
invalid factor level, NA generated
It "works":
> d
x y
1 0 a
2 2 <NA>
3 3 c
...but you likely will want to specifically alter only the numeric columns in this case, rather than the whole data frame. See, eg, the answer below using dplyr::mutate_if
.
A solution using mutate_all
from dplyr
in case you want to add that to your dplyr
pipeline:
library(dplyr)
df %>%
mutate_all(funs(ifelse(is.na(.), 0, .)))
Result:
A B C
1 1 1 2
2 2 2 5
3 3 1 2
4 0 2 0
5 1 1 0
6 2 2 0
7 3 1 3
8 0 2 0
9 1 1 3
10 2 2 3
11 3 1 0
12 0 2 3
13 1 1 4
14 2 2 4
15 3 1 0
16 0 2 0
17 1 1 1
18 2 2 0
19 3 1 2
20 0 2 0
If in any case you only want to replace the NA's in numeric columns, which I assume it might be the case in modeling, you can use mutate_if
:
library(dplyr)
df %>%
mutate_if(is.numeric, funs(ifelse(is.na(.), 0, .)))
or in base R:
replace(is.na(df), 0)
Result:
A B C
1 1 0 2
2 2 NA 5
3 3 0 2
4 0 NA 0
5 1 0 0
6 2 NA 0
7 3 0 3
8 0 NA 0
9 1 0 3
10 2 NA 3
11 3 0 0
12 0 NA 3
13 1 0 4
14 2 NA 4
15 3 0 0
16 0 NA 0
17 1 0 1
18 2 NA 0
19 3 0 2
20 0 NA 0
Data:
set.seed(123)
df <- data.frame(A=rep(c(0:3, NA), 5), B=rep(c("0", "NA"), 10), C=c(sample(c(0:5, NA), 20, replace = TRUE)))
To add to James's example, it seems you always have to create an intermediate when performing calculations on NA-containing data frames.
For instance, adding two columns (A and B) together from a data frame dfr
:
temp.df <- data.frame(dfr) # copy the original
temp.df[is.na(temp.df)] <- 0
dfr$C <- temp.df$A + temp.df$B # or any other calculation
remove('temp.df')
When I do this I throw away the intermediate afterwards with remove
/rm
.