Using sapply on column with missing values

≡放荡痞女 提交于 2019-12-11 18:47:50

问题


I understand generally what the family of apply functions do, but I'm having trouble specifically with using it to mutate a new column based on another column with missing values. I'm able to accomplish my task with a for loop, but I want to speed up the performance by using apply type functions

Say I have a time series of indices that start from today and end several years from now. My original indices only exist for the first few years. I then want to artificially extend these indices using an assumed % change (let's say 10%) for the rest of the years and store this as a new column.

Here's my sample dataset:

data <- data.frame(
date = seq.Date(as.Date("2019-01-01"),as.Date("2021-01-01"),"3 months"),
index = c(1,1.2,1.4,1.5,1.6,1.7,NA,NA,NA)
)

I can now make a new column, index2, using a for loop:

data$index2 <- 1
for (i in 1:nrow(data)) {
  if (!is.na(data$index[i])) {
    data$index2[i] = data$index[i]
  }
  else {
  data$index2[i] = data$index2[i-1]*1.1
  }
}

However, I can't figure out how I would accomplish this using the apply functions. Thanks again for any advice.


回答1:


Provided I understood correctly, this seems to be a job for lag:

library(dplyr)
data %>% mutate(index2 = if_else(!is.na(index), index, lag(index) * 1.1))
#        date index index2
#1 2019-01-01   1.0   1.00
#2 2019-04-01   1.2   1.20
#3 2019-07-01   1.4   1.40
#4 2019-10-01   1.5   1.50
#5 2020-01-01   1.6   1.60
#6 2020-04-01   1.7   1.70
#7 2020-07-01    NA   1.87
#8 2020-10-01    NA     NA
#9 2021-01-01    NA     NA

This reproduces your expected output (i.e. it replaces only the first NA); I may have misunderstood your problem statement but I don't see what *apply would have to do with this.


You could implement an sapply location like this

transform(data, index2 = c(index[1], sapply(seq_along(index)[-1], function(i)
    if (!is.na(index[i])) index[i] else index[i - 1] * 1.1)))
#        date index index2
#1 2019-01-01   1.0   1.00
#2 2019-04-01   1.2   1.20
#3 2019-07-01   1.4   1.40
#4 2019-10-01   1.5   1.50
#5 2020-01-01   1.6   1.60
#6 2020-04-01   1.7   1.70
#7 2020-07-01    NA   1.87
#8 2020-10-01    NA     NA
#9 2021-01-01    NA     NA

but this is not very pretty.


After your typo fix the problem statement changes slightly and we need cumprod

data %>%
    mutate(index2 = if_else(
        !is.na(index),
        index,
        index[which.max(index)] * cumprod(c(rep(1.0, sum(!is.na(index))), rep(1.1, sum(is.na(index)))))))
#        date index index2
#1 2019-01-01   1.0 1.0000
#2 2019-04-01   1.2 1.2000
#3 2019-07-01   1.4 1.4000
#4 2019-10-01   1.5 1.5000
#5 2020-01-01   1.6 1.6000
#6 2020-04-01   1.7 1.7000
#7 2020-07-01    NA 1.8700
#8 2020-10-01    NA 2.0570
#9 2021-01-01    NA 2.2627


来源:https://stackoverflow.com/questions/54991972/using-sapply-on-column-with-missing-values

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!