问题
I have a dataframe of over 1 million rows, and a column for each hour in the day. I want to mutate each value in those columns, but that modifition depends of the sign of the value. How can I efficiently do that ?
I could do a gather on those hourly values (then spread), but gather seems to be pretty slow on big dataframes. I could also just do the same mutate on all 24 columns, but it does not seems like a great solution when mutate_at looks to be able to do exactly that.
I'll probably have to do that kind of mutate again in the near future, and I hope to find something better than a repetitive, boring to read, code.
df = data.table(
"ID" = c(1,1,1,2,2), #Should not be useful there
"Date" = c(1,2,3,1,2), #Should not be useful there
"total_neg" = c(1,1,0,0,2),
"total_pos" = c(4,5,2,4,5),
"H1" = c(5,4,0,5,-5),
"H2" = c(5,-10,5,5,-5),
"H3" = c(-10,6,5,0,10)
)
I want to apply something like
df%>%
mutate_at(c("H1", "H2", "H3"), FUN(ifelse( Hour < 0, Hour*total_neg/10, Hour*total_pos/10)))
With Hour being the value in each column. And it obviously doesn't work, as written, nor does "." but I'm searching for something that would mean "any value in the columns we select in our mutate_at"
If it helps, I'm currently denormalizing some values with the sum of each actual positives values and negatives values stored in two columns.
In my example, this would be the expected result :
df = data.table(
"ID" = c(1,1,1,2,2),
"Date" = c(1,2,3,1,2),
"total_neg" = c(1,1,0,0,2),
"total_pos" = c(4,5,2,4,5),
"H1" = c(2,2,0,2,-1),
"H2" = c(2,-1,1,2,-1),
"H3" = c(-1,3,1,0,5)
)
df
Thanks in advance for any help you may provide, and I must apologize for my mistakes, but as a non-native, I assure you that I do my best !
回答1:
The FUN
is not an argument in mutate_at
. In the new version, the earlier used fun
is deprecated with list(~
or simply ~
. Also, wrap the columns to select in vars
. It can also be unquoted or use vars(starts_with("H"))
or vars(matches("^H\\d+$"))
. Also, replace the 'Hour' with .
library(dplyr)
df %>%
mutate_at(vars(c("H1", "H2", "H3")), ~ifelse( . < 0,
.*total_neg/10, .*total_pos/10))
#. ID Date total_neg total_pos H1 H2 H3
#1 1 1 1 4 2 2 -1
#2 1 2 1 5 2 -1 3
#3 1 3 0 2 0 1 1
#4 2 1 0 4 2 2 0
#5 2 2 2 5 -1 -1 5
来源:https://stackoverflow.com/questions/57329163/how-to-mutate-at-multiple-columns-on-a-condition-on-each-value