Find first non-missing str value in panel & use value to forward and back fill by group (SAS or PROC SQL)

后端 未结 2 866
温柔的废话
温柔的废话 2021-01-22 11:36

I have a data set containing an unbalanced panel of observations, where I want to forward and backward fill missing and/or \"wrong\" observations of ticker with the latest non-m

相关标签:
2条回答
  • 2021-01-22 11:48

    Consider using a data step to retrieve the last ticker by time for each id, then joining it to main table. Also, use a CASE statement to conditionally assign new ticker if missing or not.

    data LastTicker;
        set Tickers (where=(ticker_have ~=""));
        by id;  
        first = first.id;
        last = last.id; 
        if last = 1;
    run;
    
    proc sql;
        create table Tickers_Want as
        select t.id, t.time, t.ticker_have, 
               case when t.ticker_have = ""
                    then l.ticker_have 
                    else t.ticker_have 
               end as tickerwant
        from Tickers t
        left join LastTicker l
            on t.id = l.id
        order by t.id, t.time;
    quit;
    

    Data

    data Tickers;
       length ticker_have $ 5;
       input id time ticker_have $;
       datalines;
    1   1     ABCDE
    1   2     .    
    1   3     .    
    1   4     YYYYY
    1   5     .    
    2   4     .    
    2   5     ZZZZZ
    2   6     .    
    3   1     .    
    4   2     OOOOO
    4   3     OOOOO
    4   4     OOOOO
    ;
    

    Output

    Obs id  time  ticker_have   tickerwant
    1    1     1        ABCDE        ABCDE
    2    1     2                     YYYYY
    3    1     3                     YYYYY
    4    1     4        YYYYY        YYYYY
    5    1     5                     YYYYY
    6    2     4                     ZZZZZ
    7    2     5        ZZZZZ        ZZZZZ
    8    2     6                     ZZZZZ
    9    3     1    
    10   4     2        OOOOO        OOOOO
    11   4     3        OOOOO        OOOOO
    12   4     4        OOOOO        OOOOO
    
    0 讨论(0)
  • 2021-01-22 12:09

    You can do this several ways, but proc sql with some nested sub-queries is one solution.

    (Read it from inside out, #1 then 2 then 3. You could build each subquery into a dataset first if it helps)

    proc sql ;
      create table want as 
      /* #3 - match last ticker on id */
      select a.id, a.time, a.ticker_have, b.ticker_want
      from have a
           left join
            /* #2 - id and last ticker */
           (select x.id, x.ticker_have as ticker_want
            from have x
                 inner join
                  /* #1 - max time with a ticker per id */
                 (select id, max(time) as mt
                  from have
                  where not missing(ticker_have)
                  group by id) as y on x.id = y.id and x.time = y.mt) as b on a.id = b.id
      ;
    quit ;
    
    0 讨论(0)
提交回复
热议问题