问题
I have a dataset that includes some non-referenced data that I would like to replace with NA. In the following example, if the data in columns rep1 to rep4 does not match one of the values in the ID column, I would like to replace the value with NA. In this case, the values of x, y, and z aren't listed in the ID column, so they should be replaced.
This is a somewhat similar question that I asked earlier here : If data present, replace with data from another column based on row ID
I think the solution will be similar to what was given in the previous question, but I don't know how to alter the second portion ~ value[match(., ID)]
to return NA for values that aren't listed in the ID column.
df %>% mutate_at(vars(rep1:rep4), ~ value[match(., ID)])
ID rep1 rep2 rep3 rep4
a
b a
c a b
d a b c
e a b c d
f
g x
h
i
j y z
k z
l
m
The result should look like this:
ID rep1 rep2 rep3 rep4
a
b a
c a b
d a b c
e a b c d
f
g NA
h
i
j NA NA
k NA
l
m
Here is the data using dput()
structure(list(ID = structure(1:13, .Label = c("a", "b", "c",
"d", "e", "f", "g", "h", "i", "j", "k", "l", "m"), class = "factor"),
rep1 = structure(c(1L, 2L, 2L, 2L, 2L, 1L, 3L, 1L, 1L, 4L,
5L, 1L, 1L), .Label = c("", "a", "x", "y", "z"), class = "factor"),
rep2 = structure(c(1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 3L,
1L, 1L, 1L), .Label = c("", "b", "z"), class = "factor"),
rep3 = structure(c(1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), .Label = c("", "c"), class = "factor"), rep4 = structure(c(1L,
1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("",
"d"), class = "factor")), class = "data.frame", row.names = c(NA, -13L))
回答1:
An dplyr
alternative using replace()
df %>%
mutate_at(vars(rep1:rep4), ~replace(., which(!(. %in% ID | . == "")), NA))
ID rep1 rep2 rep3 rep4
1 a
2 b a
3 c a b
4 d a b c
5 e a b c d
6 f
7 g <NA>
8 h
9 i
10 j <NA> <NA>
11 k <NA>
12 l
13 m
回答2:
We can use sapply
and replace values to NA
if they are not present in ID
or have blank value.
df[!(sapply(df, `%in%`, df$ID) | df == '')] <- NA
df
# ID rep1 rep2 rep3 rep4
#1 a
#2 b a
#3 c a b
#4 d a b c
#5 e a b c d
#6 f
#7 g <NA>
#8 h
#9 i
#10 j <NA> <NA>
#11 k <NA>
#12 l
#13 m
来源:https://stackoverflow.com/questions/58335382/if-values-in-a-range-of-columns-arent-present-in-another-column-replace-with-n