SAS, sum by group

旧时模样 提交于 2021-02-10 09:11:35

问题


I want to calculate the sum by var1. Could you use two methods to do the calculation. SQL and data step with if first.var1.

data have;
input var1 var2$ var3;
datalines;

1 a 3
1 a 4
1 a 3
2 b 5
2 b 3
3 c 1
;
run;

data want;
input var1 var2 $ var3 sum_by_var1;
datalines;

1 a 3 10
1 a 4 10
1 a 3 10
2 b 5 9
2 b 3 9
3 c 1 9
;
run;

my two ways:

The code below works on this small data set, but I wonder if it will work on large data sets because it is hard to check the results.

proc sql;
 create table new as 
 select 
 *
 ,sum(var3) as sum_by_var1
 from have
 group by var1
 order by var1
 ;
run;

The code below doesn't work

data new2;   
   set have;       
   by var1; 
   if first.var1 then
   by_var1 + var3;
run;

回答1:


To fix your calculation using the data step you need to use:

  1. Retain keyword to calculate the sum by var1,
  2. Output keyword to output only once sum by var1 is calculated; that's when reaching the last observation for var1,
  3. If you want the segregated data you have to join back to your Have table.

Fix:

data new2;   
   set have;       
   by var1; 
   retain sum_by_var1;
   if first.var1 then do; sum_by_var1=0; end;
   sum_by_var1 + var3;
   if last.var1 then do; output; end;
run;

Output:

var1=1 var2=a var3=3 sum_by_var1=10 
var1=2 var2=b var3=3 sum_by_var1=8 
var1=3 var2=c var3=1 sum_by_var1=1 



回答2:


Here's two fully worked examples that illustrate how to do this with a grouping variable. One method uses SQL and the second uses PROC MEANS. In this example I'm doing an average, but you can replace the word mean with SUM and get your desired result.

******************************************************;
*Add average value to a dataset;
*Solution 1 - PROC MEANS + Data step;
******************************************************;

proc means data=sashelp.class noprint;
    output out=avg_values mean(height)=avg_height;
run;

data class_data;
    set sashelp.class;

    if _n_=1 then
        set avg_values;
run;

proc print data=class;
run;

*Solution 2 - PROC SQL - note the warning in the log;
PROC SQL;
Create table class_sql as
select *, mean(height) as avg_height
from sashelp.class;
quit;

******************************************************;
*Add average value to a dataset - with grouping variables;
*Solution 1 - PROC MEANS + Data step;
******************************************************;
proc means data=sashelp.class noprint nway;
class sex;
    output out=avg_values mean(height)=avg_height;
run;

*sort data before merge;
proc sort data=sashelp.class out=class;
by sex;
run;

data class_data;
 merge class avg_values;
 by sex;


run;

proc print data=class_data;
run;

*Solution 2 - PROC SQL - note the warning in the log;
PROC SQL;
Create table class_sql as
select *, mean(height) as avg_height
from sashelp.class
group by sex;
quit;



回答3:


When the sum operator (+) is used to accumulate you will need to reset it to missing at the start of each group. Additionally, since you want the group sum to be associated with each row in the group you will need to compute that sum before associating it. The double DOW loop is a common solution -- the DOW approach places the SET and BY statement inside a DO loop. In this case the double means there is one loop for computing a statistic over a group and a second loop for output.

data want;
   * loop over all rows in group to compute the sum;
   var3_sum_over_var1 = .;
   do _n_ = 1 by 1 until (last.var1);
     set have;
     by var1;
     var3_sum_over_var1 + var3;
   end;

   * associate var3_sum_over_var1 with each row as it is output;
   do _n_ = 1 to _n_;
     set have;
     OUTPUT;
   end;
run;

Note: DO loop start and stop bounds are computed once at the start of the loop and can not be changed while the loop is iterating -- thus do _n_ = 1 to _n_; works as desired.



来源:https://stackoverflow.com/questions/50032292/sas-sum-by-group

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!