问题
A bit difficult to explain, but I have a dataframe with values that look like a staircase - for every date, there are different columns that have NA for some dates. I want to create a new column that has the last non-NA column value in it.
Hopefuly it makes more sense with this example:
Sample dataframe:
test <- data.frame("date" = c(as.Date("2020-01-01"), as.Date("2020-01-02"), as.Date("2020-01-03")),
"a" = c(4, 3, 4),
"b" = c(NA, 2, 1),
"c" = c(NA, NA, 5))
Desired output:
date............val
2020-01-01...... 4
2020-01-02...... 2
2020-01-03...... 5
I'd also prefer not to do something like take the row number of the date and take that column number + 1, but if that's the only way to do it, that's that. Thanks!
回答1:
You can use max.col
with ties.method
set as "last"
to get last non-NA value in each row.
test$val <- test[cbind(1:nrow(test), max.col(!is.na(test), ties.method = 'last'))]
test
# date a b c val
#1 2020-01-01 4 NA NA 4
#2 2020-01-02 3 2 NA 2
#3 2020-01-03 4 1 5 5
回答2:
Here's a Tidyverse-based approach - convert the columns to rows using pivot_longer
, then get the last row where the value isn't NA for each date:
library(dplyr)
library(tidyr)
test %>%
pivot_longer(-date) %>%
filter(!is.na(value)) %>%
group_by(date) %>%
summarize(value = tail(value, 1), .groups = "drop")
回答3:
You can also do this with dplyr's coalesce
function, which takes the first non-missing element from the provided vectors.
library(dplyr)
test %>%
mutate(val = coalesce(c, b, a))
#> date a b c val
#> 1 2020-01-01 4 NA NA 4
#> 2 2020-01-02 3 2 NA 2
#> 3 2020-01-03 4 1 5 5
Created on 2020-07-07 by the reprex package (v0.3.0)
Note that if you have many columns, @tfehring & @Ronak's solutions will be better suited, as for this method you'll have to manually specify your columns. It does have the benefit of being short & sweet, though.
来源:https://stackoverflow.com/questions/62785827/how-to-get-value-of-last-non-na-column