Set NA to 0 in R

后端未结

关注

 4  1657

After merging a dataframe with another im left with random NA\'s for the occasional row. I\'d like to set these NA\'s to 0 so I can perform calculations with them.

相关标签:

4条回答

你的背包

2020-11-29 03:33

Why not try this

  na.zero <- function (x) {
        x[is.na(x)] <- 0
        return(x)
    }
    na.zero(df)

0 讨论(0)

情歌与酒

2020-11-29 03:37
You can just use the output of is.na to replace directly with subsetting:
```
bothbeams.data[is.na(bothbeams.data)] <- 0
```
Or with a reproducible example:
```
dfr <- data.frame(x=c(1:3,NA),y=c(NA,4:6))
dfr[is.na(dfr)] <- 0
dfr
  x y
1 1 0
2 2 4
3 3 5
4 0 6
```
However, be careful using this method on a data frame containing factors that also have missing values:
```
> d <- data.frame(x = c(NA,2,3),y = c("a",NA,"c"))
> d[is.na(d)] <- 0
Warning message:
In `[<-.factor`(`*tmp*`, thisvar, value = 0) :
  invalid factor level, NA generated
```
It "works":
```
> d
  x    y
1 0    a
2 2 <NA>
3 3    c
```
...but you likely will want to specifically alter only the numeric columns in this case, rather than the whole data frame. See, eg, the answer below using dplyr::mutate_if.
0 讨论(0)
发布评论:

提交评论
- 加载中...

日久生厌

2020-11-29 03:42

A solution using mutate_all from dplyr in case you want to add that to your dplyr pipeline:

library(dplyr)
df %>%
  mutate_all(funs(ifelse(is.na(.), 0, .)))

Result:

If in any case you only want to replace the NA's in numeric columns, which I assume it might be the case in modeling, you can use mutate_if:

library(dplyr)
df %>%
  mutate_if(is.numeric, funs(ifelse(is.na(.), 0, .)))

or in base R:

replace(is.na(df), 0)

Result:

Data:

set.seed(123)
df <- data.frame(A=rep(c(0:3, NA), 5), B=rep(c("0", "NA"), 10), C=c(sample(c(0:5, NA), 20, replace = TRUE)))

0 讨论(0)

心在旅途

2020-11-29 03:44
To add to James's example, it seems you always have to create an intermediate when performing calculations on NA-containing data frames.

For instance, adding two columns (A and B) together from a data frame dfr:
```
temp.df <- data.frame(dfr) # copy the original
temp.df[is.na(temp.df)] <- 0
dfr$C <- temp.df$A + temp.df$B # or any other calculation
remove('temp.df')
```
When I do this I throw away the intermediate afterwards with remove/rm.
0 讨论(0)
发布评论:

提交评论
- 加载中...