Count observations within dynamic range

前端 未结 2 1566
刺人心
刺人心 2021-01-15 02:47

Consider the following example:

input group day month year number treatment NUM
1 1 2 2000 1 1 2
1 1 6 2000 2 0 .
1 1 9 2000 3 0 .
1 1 5 2001 4 0 .
1 1 1 201         


        
相关标签:
2条回答
  • 2021-01-15 02:59

    The question is not completely clear.

    Consider the following data with two different results, num2 and num3:

      +-----------------------------------------+
      |     date2   group   treat   num2   num3 |
      |-----------------------------------------|
      | 01feb2000       1       1      3      2 |
      | 01jun2000       1       0      .      . |
      | 01sep2000       1       0      .      . |
      | 01nov2000       1       1      0      0 |
      | 01may2002       1       0      .      . |
      | 01jan2010       1       1      1      1 |
      | 01may2010       1       0      .      . |
      |-----------------------------------------|
      | 01jan2001       2       1      0      0 |
      | 01mar2002       2       1      0      0 |
      +-----------------------------------------+
    

    The variable num2 is computed assuming you are interested in counting all observations that are within a one-year period after a treated observation (treat == 1), be those observations equal to 0 or 1 for treat. For example, after 01feb2000, there are three observations that comply with the time span condition; two have treat==0 and one has treat == 1, and they are all counted.

    The variable num3 is also counting observations that are within a one-year period after a treated observation, but only the cases for which treat == 0.

    num2 is computed with code in the spirit of the article you have cited. The use of in makes the run more efficient and there is no gsort (as in your code), which is quite slow. I have assumed that in each group there are no repeated dates:

    clear
    set more off
    
    input ///
    group str15 date count treat num
    1 01.02.2000 1 1 2
    1 01.06.2000 2 0 .
    1 01.09.2000 3 0 .
    1 01.11.2000 3 1 .
    1 01.05.2002 4 0 .
    1 01.01.2010 5 1 1
    1 01.05.2010 6 0 .
    2 01.01.2001 1 1 0
    2 01.03.2002 2 1 0
    end
    
    list
    
    gen date2 = date(date,"DMY")
    format date2 %td
    drop date count num
    
    order date
    list, sepby(group)
    
    
    *----- what you want -----
    
    gen num2 = .
    isid group date, sort
    
    forvalues j = 1/`=_N' {
    
        count in `j'/L if inrange(date2 - date2[`j'], 1, 365) & group == group[`j'] 
    
        replace num2 = r(N) in `j'
    }
    
    replace num2 = . if !treat
    
    list, sepby(group)
    

    num3 is computed with code similar in spirit (and results) as that posted by @jfeigenbaum:

    <snip>
    
    *----- what you want -----
    
    isid group date, sort
    by group: gen indicat = sum(treat)
    
    sort group indicat, stable
    by group indicat: egen num3 = total(inrange(date2 - date2[1], 1, 365))
    replace num3 = . if !treat
    
    list, sepby(group)
    

    Even more than two interpretations are possible for your problem, but I'll leave it at that.

    (Note that I have changed your example data to include cases that probably make the problem more realistic.)

    0 讨论(0)
  • 2021-01-15 03:11

    I am assuming that treatment can't occur within 1 year of the previous treatment (in the group). This is true in your example data, but may not be true in general. But, assuming that it is the case, then this should work. I'm using carryforward which is on SSC (ssc install carryforward). Like your latestDate thought, I determine one year after the most recent treatment and count the number of observations in that window.

    sort group date
    gen yrafter = (date + 365) if treatment == 1
    by group: carryforward yrafter, replace
    format yrafter %td
    gen in_window = date <= yrafter & treatment == 0
    egen answer = sum(in_window), by(group yrafter)
    replace answer = . if treatment == 0
    

    I can't promise this will be faster than a loop but I suspect that it will be.

    0 讨论(0)
提交回复
热议问题