Summing a Column By Group In a Dataset With Macros

前端 未结 3 427
说谎
说谎 2021-01-27 11:34

I have a dataset that looks like:

 Month   Cost_Center      Account    Actual    Annual_Budget
 June     53410           Postage       13      234
 June     5342         


        
相关标签:
3条回答
  • 2021-01-27 12:10

    SAS DATA step behavior is quite complex ("About DATA Step Execution" in SAS Language Reference: Concepts). The default behavior, that you're seeing, is: at the end of each iteration (i.e. for each input row) the row is written to the output data set, and the PDV - all data step variables - is reset.

    You can't expect to write Base SAS "intuitively" without spending a few days learning it first, so I recommend using PROC SQL, unless you have a reason not to.

    If you really want to aggregate in data step, you have to use something called BY groups processing: after ensuring the input data set is sorted by the BY vars, you can use something like the following:

    data Test (keep = Month Account Sum_Actual Sum_Annual /*...your Run_Rate and Over_Budget_Alarm...*/);
        set Combined; /* the input table */
        by Month Account; /* must be sorted by these */
    
        retain Sum_Actual Sum_Annual; /* don't clobber for each input row */
        if first.account then do; /* instead do it manually for each group */
            Sum_Actual = 0;
            Sum_Annual = 0;
        end;
    
        /* accumulate the values from each row */
        Sum_Actual = sum(Sum_Actual, Actual);
        Sum_Annual = sum(Sum_Annual, Annual_Budget);
        /* Note that Sum_Actual = Sum_Actual+Actual; will not work if any of the input values is 'missing'. */
    
        if last.account then do;
            /* The group has been processed.
               Do any additional processing for the group as a whole, e.g.
               calculate Over_Budget_Alarm. */
            output; /* write one output row per group */
        end;
    run;
    
    0 讨论(0)
  • 2021-01-27 12:12

    Proc SQL can be very effective for understanding aggregate data examination. With out seeing what the macros do, I would say perform the run rate checks after outputting data set test.

    You don't show rows for other months, but I must presume the annual_budget values are constant across all months -- if so, I don't see a reason to ever sum annual_budget; comparing anything to sum(annual_budget) is probably at the incorrect time scale and not useful.

    From the show data its hard to tell if you want to know any of these

    • which (or if some) months had a run_rate that exceeded the annual_budget
    • which (or if some) months run_rate exceeded the balance of annual_budget (i.e. the annual_budget less the prior months expenditure)

    Presume each row in test is for a single year/month/costCenter/account -- if not the underlying data would have to be aggregated to that level.

    Proc SQL;
      * retrieve presumed constant annual_budget values from data;
      * this information might (should) already exist in another table;
    
      * presume constant annual budget value at each cost center | account combination;
      * distinct because there are multiple months with the same info;
    
      create table annual_budgets as
      select distinct Cost_Center, Account, Annual_Budget
      from test;
    
      create table account_budgets as
      select account, sum(annual_budget) as annual_budget 
      from annual_budgets
      group by account;
    
      * flag for some run rate condition;
    
      create table annual_budget_mon_runrate_check as
      select 
        2019 as year,
        account,
        sum(actual) as yr_actual,  /* across all month/cost center */
        min (
          select annual_budget from account_budgets as inner
          where inner.account = outer.account
        ) as account_budget,
    
        max (
          case when actual * 12 > annual_budget then 1 else 0 end
        ) as
          excessive_runrate_flag label="At least one month had a cost center run rate that would exceed its annual_budget")
      from 
        test as outer
      group by
        year, account;
    

    You can add a where clause to restrict the accounts processed.

    Changing the max to sum in the flag computation would return the number of cost center months with excessive run rates.

    0 讨论(0)
  • 2021-01-27 12:16
    1. Use PROC MEANS to summarize the data
    2. Use a data step and IF/THEN statement to create your flags.

      proc means data=have N SUM NWAY STACKODS;
         class account;
         var amount annual_budget;
         ods output summary = summary_stats1;
         output out = summary_stats2 N = SUM= / AUTONAME;
      run;
      
      data want;
        set summary_stats;
        if sum_actual > sum_annual_budget then flag=1; 
        else flag=0;
      run;
      
    0 讨论(0)
提交回复
热议问题