How to calculate time-weighted average and create lags

后端 未结 1 1713
爱一瞬间的悲伤
爱一瞬间的悲伤 2021-01-27 22:29

I have searched the forum, but found nothing that could answer or provide hint on how to do what I wish to on the forum.

I have yearly measurement of exposure data from

相关标签:
1条回答
  • 2021-01-27 22:57

    This is not an elegant answer. But, I would like to leave what I tried. I first arranged the data frame. I wanted to identify which year will be the key year for each subject. So, I created id. variable comes from the column names (e.g., pol_2000) in your original data set. entryYear comes from entry in your data. entryMonth comes from entry as well. check was created in order to identify which year is the base year for each participant. In my next step, I extracted six rows for each participant using getMyRows in the SOfun package. In the next step, I used lapply and did math as you described in your question. For the calculation for two/five year average, I divided the total values by year (2 or 5). I was not sure how the final output would look like. So I decided to use the base year for each subject and added three columns to it.

    library(stringi)
    library(SOfun)
    devtools::install_github("hadley/tidyr")
    library(tidyr)
    library(dplyr)
    
    
    ### Big thanks to BondedDust for this function
    ### http://stackoverflow.com/questions/6987478/convert-a-month-abbreviation-to-a-numeric-month-in-r
    
    mo2Num <- function(x) match(tolower(x), tolower(month.abb))
    
    
    ### Arrange the data frame.
    ana <- foo %>%
           mutate(id = 1:n()) %>%
           melt(id.vars = c("id","entry")) %>%
           arrange(id) %>%
           mutate(variable = as.numeric(gsub("^.*_", "", variable)),
                  entryYear = as.numeric(stri_extract_last(entry, regex = "\\d+")),
                  entryMonth = mo2Num(substr(entry, 3,5)) - 1,
                  check = ifelse(variable == entryYear, "Y", "N"))
    
    ### Find a base year for each subject and get some parts of data for each participant.
    indx <- which(ana$check == "Y")
    bob <- getMyRows(ana, pattern = indx, -5:0)
    
    
    ### Get one-year average
    cathy <- lapply(bob, function(x){
        x$one <- ((x[6,6] / 12) * x[6,4]) + (((12-x[5,6])/12) * x[5,4])
        x 
    })
    
    one <- unnest(lapply(cathy, `[`, i = 6, j = 8))
    
    ### Get two-year average
    cathy <- lapply(bob, function(x){
        x$two <- (((x[6,6] / 12) * x[6,4]) + x[5,4] + (((12-x[4,6])/12) * x[4,4])) / 2
        x 
    })
    
    two <- unnest(lapply(cathy, `[`, i = 6, j =8))
    
    
    ### Get five-year average
    cathy <- lapply(bob, function(x){
        x$five <- (((x[6,6] / 12) * x[6,4]) + x[5,4] + x[4,4] + x[3,4] + x[2,4] + (((12-x[2,6])/12) * x[1,4])) / 5 
        x 
    })
    
    five <- unnest(lapply(cathy, `[`, i =6 , j =8))
    
    ### Combine the results with the key observations
    final <- cbind(ana[which(ana$check == "Y"),], one, two, five)
    colnames(final) <- c(names(ana), "one", "two", "five")
    
    #   id     entry variable value entryYear entryMonth check       one       two      five
    #6   1 07feb2002     2002    18      2002          1     Y 18.916667 18.500000 18.766667
    #14  2 06jun2002     2002    16      2002          5     Y 16.583333 16.791667 17.150000
    #23  3 16apr2003     2003    14      2003          3     Y 15.500000 15.750000 16.050000
    #31  4 26may2003     2003    16      2003          4     Y 16.666667 17.166667 17.400000
    #39  5 11jun2003     2003    13      2003          5     Y 13.583333 14.083333 14.233333
    #48  6 20feb2004     2004     3      2004          1     Y  3.000000  3.458333  3.783333
    #56  7 25jul2004     2004     2      2004          6     Y  2.000000  2.250000  2.700000
    #64  8 19aug2004     2004     4      2004          7     Y  4.000000  4.208333  4.683333
    #72  9 19dec2004     2004     5      2004         11     Y  5.083333  5.458333  4.800000
    
    0 讨论(0)
提交回复
热议问题