Replace all 0 values to NA

前端未结

关注

 8  2221

I have a dataframe with some numeric columns. Some row has a 0 value which should be considered as null in statistical analysis. What is the fastest way to replace all the 0

相关标签:

8条回答

夕颜

2020-11-22 08:58
In case anyone arrives here via google looking for the opposite (i.e. how to replace all NAs in a data.frame with 0), the answer is
```
df[is.na(df)] <- 0
```
OR

Using dplyr / tidyverse
```
library(dplyr)
mtcars %>% replace(is.na(.), 0)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
花落未央

2020-11-22 09:05
Replacing all zeroes to NA:
```
df[df == 0] <- NA
```
Explanation

1. It is not NULL what you should want to replace zeroes with. As it says in ?'NULL',

NULL represents the null object in R

which is unique and, I guess, can be seen as the most uninformative and empty object.¹ Then it becomes not so surprising that
```
data.frame(x = c(1, NULL, 2))
#   x
# 1 1
# 2 2
```
That is, R does not reserve any space for this null object.² Meanwhile, looking at ?'NA' we see that

NA is a logical constant of length 1 which contains a missing value indicator. NA can be coerced to any other vector type except raw.

Importantly, NA is of length 1 so that R reserves some space for it. E.g.,
```
data.frame(x = c(1, NA, 2))
#    x
# 1  1
# 2 NA
# 3  2
```
Also, the data frame structure requires all the columns to have the same number of elements so that there can be no "holes" (i.e., NULL values).

Now you could replace zeroes by NULL in a data frame in the sense of completely removing all the rows containing at least one zero. When using, e.g., var, cov, or cor, that is actually equivalent to first replacing zeroes with NA and setting the value of use as "complete.obs". Typically, however, this is unsatisfactory as it leads to extra information loss.

2. Instead of running some sort of loop, in the solution I use df == 0 vectorization. df == 0 returns (try it) a matrix of the same size as df, with the entries TRUE and FALSE. Further, we are also allowed to pass this matrix to the subsetting [...] (see ?'['). Lastly, while the result of df[df == 0] is perfectly intuitive, it may seem strange that df[df == 0] <- NA gives the desired effect. The assignment operator <- is indeed not always so smart and does not work in this way with some other objects, but it does so with data frames; see ?'<-'.

_{¹ The empty set in the set theory feels somehow related.}
_{² Another similarity with the set theory: the empty set is a subset of every set, but we do not reserve any space for it.}
0 讨论(0)
发布评论:

提交评论
- 加载中...

深忆病人

2020-11-22 09:16

#Sample data
set.seed(1)
dat <- data.frame(x = sample(0:2, 5, TRUE), y = sample(0:2, 5, TRUE))
#-----
  x y
1 0 2
2 1 2
3 1 1
4 2 1
5 0 0

#replace zeros with NA
dat[dat==0] <- NA
#-----
   x  y
1 NA  2
2  1  2
3  1  1
4  2  1
5 NA NA

0 讨论(0)

自闭症患者

2020-11-22 09:17
Let me assume that your data.frame is a mix of different datatypes and not all columns need to be modified.

to modify only columns 12 to 18 (of the total 21), just do this
```
df[, 12:18][df[, 12:18] == 0] <- NA
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

醉酒成梦

2020-11-22 09:17

dplyr::na_if() is an option:

library(dplyr)  

df <- data_frame(col1 = c(1, 2, 3, 0),
                 col2 = c(0, 2, 3, 4),
                 col3 = c(1, 0, 3, 0),
                 col4 = c('a', 'b', 'c', 'd'))

na_if(df, 0)
# A tibble: 4 x 4
   col1  col2  col3 col4 
  <dbl> <dbl> <dbl> <chr>
1     1    NA     1 a    
2     2     2    NA b    
3     3     3     3 c    
4    NA     4    NA d

0 讨论(0)

感动是毒

2020-11-22 09:18
You can replace 0 with NA only in numeric fields (i.e. excluding things like factors), but it works on a column-by-column basis:
```
col[col == 0 & is.numeric(col)] <- NA
```
With a function, you can apply this to your whole data frame:
```
changetoNA <- function(colnum,df) {
    col <- df[,colnum]
    if (is.numeric(col)) {  #edit: verifying column is numeric
        col[col == -1 & is.numeric(col)] <- NA
    }
    return(col)
}
df <- data.frame(sapply(1:5, changetoNA, df))
```
Although you could replace the 1:5 with the number of columns in your data frame, or with 1:ncol(df).
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页