I have a data set containing an unbalanced panel of observations, where I want to forward and backward fill missing and/or \"wrong\" observations of ticker with the latest non-m
Consider using a data step to retrieve the last ticker by time for each id, then joining it to main table. Also, use a CASE
statement to conditionally assign new ticker if missing or not.
data LastTicker;
set Tickers (where=(ticker_have ~=""));
by id;
first = first.id;
last = last.id;
if last = 1;
run;
proc sql;
create table Tickers_Want as
select t.id, t.time, t.ticker_have,
case when t.ticker_have = ""
then l.ticker_have
else t.ticker_have
end as tickerwant
from Tickers t
left join LastTicker l
on t.id = l.id
order by t.id, t.time;
quit;
Data
data Tickers;
length ticker_have $ 5;
input id time ticker_have $;
datalines;
1 1 ABCDE
1 2 .
1 3 .
1 4 YYYYY
1 5 .
2 4 .
2 5 ZZZZZ
2 6 .
3 1 .
4 2 OOOOO
4 3 OOOOO
4 4 OOOOO
;
Output
Obs id time ticker_have tickerwant
1 1 1 ABCDE ABCDE
2 1 2 YYYYY
3 1 3 YYYYY
4 1 4 YYYYY YYYYY
5 1 5 YYYYY
6 2 4 ZZZZZ
7 2 5 ZZZZZ ZZZZZ
8 2 6 ZZZZZ
9 3 1
10 4 2 OOOOO OOOOO
11 4 3 OOOOO OOOOO
12 4 4 OOOOO OOOOO
You can do this several ways, but proc sql
with some nested sub-queries is one solution.
(Read it from inside out, #1 then 2 then 3. You could build each subquery into a dataset first if it helps)
proc sql ; create table want as /* #3 - match last ticker on id */ select a.id, a.time, a.ticker_have, b.ticker_want from have a left join /* #2 - id and last ticker */ (select x.id, x.ticker_have as ticker_want from have x inner join /* #1 - max time with a ticker per id */ (select id, max(time) as mt from have where not missing(ticker_have) group by id) as y on x.id = y.id and x.time = y.mt) as b on a.id = b.id ; quit ;