问题
I have a datatable with missing data and two columns supposed to help to replace those missing data which looks like :
library(data.table)
Data = data.table(
"H1" = c(NaN,4,NaN),
"H2" = c(5,NaN,NaN),
"H3" = c(7,NaN,NaN),
"Group" = c(1,2,1),
"Factor" = c(2,3,4)
)
H1 H2 H3 Group Factor
1: NaN 5 7 1 2
2: 4 NaN NaN 2 3
3: NaN NaN NaN 1 4
I would like to use a second dataframe for that
Groups = data.table(
"H1" = c(1,2,3),
"H2" = c(4,5,6),
"H3" = c(7,8,9),
"Group" = c(1,2,3)
)
H1 H2 H3 Group
1: 1 4 7 1
2: 2 5 8 2
3: 3 6 9 3
The column "Group" in the dataframe Groups may or may not be useful as it's basically the row number here.
I think about something writing something close to
Data%>%
mutate_at(vars(matches("^H\\d+$")), ~ifelse(is.na(.),
Groups[Group, Hour]*Factor, .))
But obviously, "Hour" is undefined but hopefully could be written like :
as.numeric(substr(columnName, 2, nchar(columnName)))
Expected result :
H1 H2 H3 Group Factor
1: 2 5 7 1 2
2: 4 15 24 2 3
3: 4 16 28 1 4
How could I get that columnName ?
Additional issue : When I replace Hour by 2 in this command, for testing purpose. The whole column Group is considered and not only the Group value of the current line, and I don't understand why.
Any solution that would not consist into repeating mutate commands for each one of my columns, but gets the job done is much appreciated !
This Question may be related to my issue, but I fail to use that "deparse(substitute(.))" command.
回答1:
Here is a solution with mutate_at
and map2
.
library(purrr)
library(dplyr)
# Define columns for use later
cols_x <- paste0("H", 1:3, ".x")
cols_y <- paste0("H", 1:3, ".y")
# Multiply hours by factor
df <- left_join(Data, Groups, by = "Group") %>%
mutate_at(cols_y, ~ . * Factor)
# Replace values if missing
df <- as.data.frame(map2(cols_x, cols_y, ~ ifelse(is.na(df[[.x]]), df[[.y]], df[[.x]]))) %>%
setNames(gsub(".x", "", cols_x))
来源:https://stackoverflow.com/questions/57976669/how-to-use-the-name-of-the-column-im-currently-using-in-a-mutate-at