How to replace NA with most recent non-NA by group? [duplicate]

后端未结

关注

 8  2060

清歌不尽

相关标签:

8条回答

Happy的楠姐

2020-11-30 14:48

Depending upon what you are doing next, you may prefer the data in a nested form.

(nested <- df %>% 
  group_by(name) %>% 
  summarize(
    age = na.omit(age)[1], 
    birthplace = na.omit(birthplace)[1], 
    value = list(value)
  )
)
## # A tibble: 4 x 4
##     name   age birthplace     value
##   <fctr> <dbl>     <fctr>    <list>
## 1      A    28      city1 <int [2]>
## 2      B    NA      city2 <int [3]>
## 3      C    NA         NA <int [1]>
## 4      D    53         NA <int [2]>

If you need to compute on individual values, you can always unnest it later.

nested %>% tidyr::unnest()
## # A tibble: 8 x 4
##     name   age birthplace value
##   <fctr> <dbl>     <fctr> <int>
## 1      A    28      city1   100
## 2      A    28      city1   101
## 3      B    NA      city2   102
## 4      B    NA      city2   103
## 5      B    NA      city2   104
## 6      C    NA         NA   105
## 7      D    53         NA   106
## 8      D    53         NA   107

0 讨论(0)

轮回少年

2020-11-30 14:54
You can wrap the na.locf in do
```
df %>% group_by(name) %>% do(na.locf(., na.rm = FALSE))
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
爱一瞬间的悲伤

2020-11-30 14:59
Here is a base R solution. The fill function invokes ave using na.omit(x)[1] as in Richie Cotton's solution.
```
fill <- function(...) ave(..., FUN = function(x) na.omit(x)[1])
transform(df, birthplace = fill(birthplace, name), age = fill(age, name))
```
Note: This also works with na.locf. Replace fill with:
```
library(zoo)
fill <- function(...) ave(..., FUN = function(x) na.locf(x, na.rm = FALSE))
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
青春惊慌失措

2020-11-30 15:06
As another base R solution, here is a poor man's na.locf
```
fill_down <- function(v) {
    if (length(v) > 1) {
        keep <- c(TRUE, !is.na(v[-1]))
        v[keep][cumsum(keep)]
    } else v
}
```
To fill down by group, the approach is to use tapply() to split and apply to each group, and split<- to combine groups to the original geometry, as
```
fill_down_by_group <- function(v, grp) {
    ## original 'by hand':
    ##     split(v, grp) <- tapply(v, grp, fill_down)
    ##     v
    ## done by built-in function `ave()`
    ave(v, grp, FUN=fill_down)
}
```
To process multiple columns, one might
```
elts <- c("age", "birthplace")
df[elts] <- lapply(df[elts], fill_down_by_group, df$name)
```
Notes
1. I would be interested in seeing how a dplyr solution handles many columns, without hard-coding each? Answering my own question, I guess this is
```
library(dplyr); library(tidyr)
df %>% group_by(name) %>% fill_(elts)
```
2. A more efficient base solution when the groups are already 'grouped' (e.g., identical(grp, sort(grp))) is
```
fill_down_by_grouped <- function(v, grp) {
    if (length(v) > 1) {
        keep <- !(duplicated(v) & is.na(v))
        v[keep][cumsum(keep)]
    } else v
}
```
3. For me, fill_down() on a vector with about 10M elements takes ~225ms; fill_down_by_grouped() takes ~300ms independent of the number of groups; fill_down_by_group() scales with the number of groups; for 10000 groups ~2s, 10M groups about 36s
0 讨论(0)
发布评论:

提交评论
- 加载中...

有刺的猬

2020-11-30 15:07

Consider also a nested apply base solution running a rolling head() for each column:

df <- setNames(data.frame(lapply(names(df), function(d)
               sapply(1:nrow(df), function(i)
                      head(df[df[1:i, c("name")] == df$name[i], c(d)], 1))
        )), names(df))

0 讨论(0)

春和景丽

2020-11-30 15:12
You could this through a merge too. Just do a join on name column. Then do a group by on value.
```
library(sqldf)
sqldf('select t1.name, t2.age, t2.birthplace,t1.value from df t1 inner join df t2 on t1.name=t2.name group by t1.value')
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页

热议问题