Sum consecutive observations by some variable

I'm just learning to use SAS, so bear with me a bit. I have the following sample patient data on prescription usage and I'd like to try to combine observations to form more of a patient story, but keep the timeline intact:

data have;
 input dose $2. id $4. supply date $8.;
 datalines;
 "5" 1234 30 01012015
 "10" 1234 30 02012015
 "10" 1234 30 03012015
 "5" 1234 30 04012015
 "2" 1234 30 05012015
 "5" 4321 30 07012016
 "2" 9876 30 05012016
 "2" 9876 30 06012016
 "10" 9876 30 07012016
 ;
run;

Where dose is the dosage in mg, id is patient ID, supply is the number of days' supply of the medication, and date is the date of the refill.

I'd like to consolidate some of the observations so that when we look at patient 1234 we can see they were taking 5mg for 30 days, then 10mg for 60 days, then 5 mg again for 30 days, etc. All of the summation and group by commands I've learned would combine observations 1 and 4 together, but the patient story was that the dosage was increased and then decreased, and I'd like to keep that intact but don't know how.

So it would look like this:

data want;
 input dose $2. id $4. supply date $8.;
 datalines;
 "5" 1234 30 01012015
 "10" 1234 60 02012015
 "5" 1234 30 04012015
 "2" 1234 30 05012015
 "5" 4321 30 07012016
 "2" 9876 60 05012016
 "10" 9876 30 07012016
 ;
run;

See observation 3 rolled up into 2, 8 into 7, etc.

Any tips would be greatly appreciated!

Here is one solution relying on retain variables. It is only one among many, and it uses rather advanced techniques that could scare the crap out of a beginner. You have been warned ;)

The use of goto & labels (ending with :) is not very common and in most cases can be avoided. But in a situation like this, it seems warranted, mainly for concision.

data have;
  informat id 4. dose 3. supply 3. date mmddyy8.;
  format date mmddyy10.;
  input id dose supply date;
  datalines;
1234  5 30 01012015
1234 10 30 02012015
1234 10 30 03012015
1234  5 30 04012015
1234  2 30 05012015
4321  5 30 07012016
9876  2 30 05012016
9876  2 30 06012016
9876 10 30 07012016
;

We first make sure our data is properly sorted.

proc sort data=have;
  by id date;
run;

The Solution

The retain statement will make it so that values for the declared variables are kept in memory as the data step iterates over rows of the have data set.

Note that the _i suffix is added to the existing variables from have, i standing for input.

data want(drop=id_i dose_i supply_i date_i);
  format id dose supply 8. date mmddyy10.;
  retain id dose supply date;
  set have(rename=(id=id_i dose=dose_i supply=supply_i date=date_i)) end=last;

  if _N_ = 1 then goto propagate;

  if id_i = id and dose_i = dose then do;
    supply = supply + supply_i;
    goto checklast;
  end;

  * When id or dose is different from previous row, ;
  * we write the observation to the want table.     ;
  output;

  propagate:
  id     = id_i;
  dose   = dose_i;
  supply = supply_i;
  date   = date_i;

  checklast:
  if last then output;
run;

A few things to note here:

_N_ is an automatic SAS variable indicating the current iteration number
end=last (used as a parameter to the set statement) creates a variable called last (this is an arbitrary name) that will take on value 1 when the last observation is read from have, and 0 otherwise. We use it as a boolean variable at the end of the data step.
Keep in mind, in trying to figure this out, that a data step functions just like a for loop, iterating over rows of its source table.

Results

id    dose   supply    date
1234    5       30    01/01/2015
1234    10      60    02/01/2015
1234    5       30    04/01/2015
1234    2       30    05/01/2015
4321    5       30    07/01/2016
9876    2       60    05/01/2016
9876    10      30    07/01/2016

Another approach uses the NOTSORTED option which doesn't require any predetermine knowledge about the data set.

data have;
informat date mmddyy8.;
format date date9.;
 input dose  id $ supply date ;
 datalines;
5 1234 30 01012015
10 1234 30 02012015
10 1234 30 03012015
5 1234 30 04012015
2 1234 30 05012015
5 4321 30 07012016
2 9876 30 05012016
2 9876 30 06012016
10 9876 30 07012016
 ;
run;

proc sort data=have;
  by id date;
run;

data want;
set have;
by id dose notsorted;
retain n_days;

if first.dose or first.id then n_days=0;

n_days+supply;

if last.dose then output;

run;

来源：https://stackoverflow.com/questions/45361194/sum-consecutive-observations-by-some-variable

标签

sas

retain