Sum consecutive observations by some variable

a 夏天 提交于 2019-12-07 18:10:28

Here is one solution relying on retain variables. It is only one among many, and it uses rather advanced techniques that could scare the crap out of a beginner. You have been warned ;)

The use of goto & labels (ending with :) is not very common and in most cases can be avoided. But in a situation like this, it seems warranted, mainly for concision.

data have;
  informat id 4. dose 3. supply 3. date mmddyy8.;
  format date mmddyy10.;
  input id dose supply date;
  datalines;
1234  5 30 01012015
1234 10 30 02012015
1234 10 30 03012015
1234  5 30 04012015
1234  2 30 05012015
4321  5 30 07012016
9876  2 30 05012016
9876  2 30 06012016
9876 10 30 07012016
;

We first make sure our data is properly sorted.

proc sort data=have;
  by id date;
run;

The Solution

The retain statement will make it so that values for the declared variables are kept in memory as the data step iterates over rows of the have data set.

Note that the _i suffix is added to the existing variables from have, i standing for input.

data want(drop=id_i dose_i supply_i date_i);
  format id dose supply 8. date mmddyy10.;
  retain id dose supply date;
  set have(rename=(id=id_i dose=dose_i supply=supply_i date=date_i)) end=last;

  if _N_ = 1 then goto propagate;

  if id_i = id and dose_i = dose then do;
    supply = supply + supply_i;
    goto checklast;
  end;

  * When id or dose is different from previous row, ;
  * we write the observation to the want table.     ;
  output;

  propagate:
  id     = id_i;
  dose   = dose_i;
  supply = supply_i;
  date   = date_i;

  checklast:
  if last then output;
run;

A few things to note here:

  • _N_ is an automatic SAS variable indicating the current iteration number
  • end=last (used as a parameter to the set statement) creates a variable called last (this is an arbitrary name) that will take on value 1 when the last observation is read from have, and 0 otherwise. We use it as a boolean variable at the end of the data step.
  • Keep in mind, in trying to figure this out, that a data step functions just like a for loop, iterating over rows of its source table.

Results

id    dose   supply    date
1234    5       30    01/01/2015
1234    10      60    02/01/2015
1234    5       30    04/01/2015
1234    2       30    05/01/2015
4321    5       30    07/01/2016
9876    2       60    05/01/2016
9876    10      30    07/01/2016

Another approach uses the NOTSORTED option which doesn't require any predetermine knowledge about the data set.

data have;
informat date mmddyy8.;
format date date9.;
 input dose  id $ supply date ;
 datalines;
5 1234 30 01012015
10 1234 30 02012015
10 1234 30 03012015
5 1234 30 04012015
2 1234 30 05012015
5 4321 30 07012016
2 9876 30 05012016
2 9876 30 06012016
10 9876 30 07012016
 ;
run;

proc sort data=have;
  by id date;
run;

data want;
set have;
by id dose notsorted;
retain n_days;

if first.dose or first.id then n_days=0;

n_days+supply;

if last.dose then output;

run;
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!